100+ datasets found

F
Bahasa Open Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Bahasa Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Bahasa language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Bahasa. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Bahasa Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Bahasa are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
Share of questions answered by AI models in SimpleQA benchmark 2025
statista.com
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of questions answered by AI models in SimpleQA benchmark 2025 [Dataset]. https://www.statista.com/statistics/1612496/ai-simpleqa-share-of-questions-answered/
Explore at:
Dataset updated
May 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
OpenAI's o1 had the highest share of questions answered when attempted in SimpleQA benchmark in 2025. Claude-3 had the highest share of simply not attempting questions, though whether this is due to lack of data or other reasons is unknown.
w
Dataset of books called 101 toughest interview questions : -and answers that...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called 101 toughest interview questions : -and answers that win the job! [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=101+toughest+interview+questions+%3A+-and+answers+that+win+the+job%21
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is 101 toughest interview questions : -and answers that win the job!. It features 7 columns including author, publication date, language, and book publisher.
Mathematical Problems Dataset: Various
kaggle.com
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mathematical Problems Dataset: Various [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathematical-problems-dataset-various-mathematic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

By math_dataset (From Huggingface) [source]

About this dataset

This dataset comprises a collection of mathematical problems and their solutions designed for training and testing purposes. Each problem is presented in the form of a question, followed by its corresponding answer. The dataset covers various mathematical topics such as arithmetic, polynomials, and prime numbers. For instance, the arithmetic_nearest_integer_root_test.csv file focuses on problems involving finding the nearest integer root of a given number. Similarly, the polynomials_simplify_power_test.csv file deals with problems related to simplifying polynomials with powers. Additionally, the dataset includes the numbers_is_prime_train.csv file containing math problems that require determining whether a specific number is prime or not. The questions and answers are provided in text format to facilitate analysis and experimentation with mathematical problem-solving algorithms or models

How to use the dataset

Introduction: The Mathematical Problems Dataset contains a collection of various mathematical problems and their corresponding solutions or answers. This guide will provide you with all the necessary information on how to utilize this dataset effectively.

Understanding the columns: The dataset consists of several columns, each representing a different aspect of the mathematical problem and its solution. The key columns are:

question: This column contains the text representation of the mathematical problem or equation.

answer: This column contains the text representation of the solution or answer to the corresponding problem.

Exploring specific problem categories: To focus on specific types of mathematical problems, you can filter or search within the dataset using relevant keywords or terms related to your area of interest. For example, if you are interested in prime numbers, you can search for prime in the question column.

Applying machine learning techniques: This dataset can be used for training machine learning models related to natural language understanding and mathematics. You can explore various techniques such as text classification, sentiment analysis, or even sequence-to-sequence models for solving mathematical problems based on their textual representations.

Generating new questions and solutions: By analyzing patterns in this dataset, you can generate new questions and solutions programmatically using techniques like data augmentation or rule-based methods.

Validation and evaluation: As with any other machine learning task, it is essential to validate your models on separate validation sets not included in this dataset properly. You can also evaluate model performance by comparing predictions against known answers provided in this dataset's answer column.

Sharing insights and findings: After working with this datasets, it would be beneficial for researchers or educators to share their insights, approaches taken during analysis/modelling as Kaggle notebooks/ discussions/ blogs/ tutorials etc., so that others could get benefited from such shared resources too.

Note: Please note that the dataset does not include dates.

By following these guidelines, you can effectively explore and utilize the Mathematical Problems Dataset for various mathematical problem-solving tasks. Happy exploring!

Research Ideas

Developing machine learning algorithms for solving mathematical problems: This dataset can be used to train and test models that can accurately predict the solution or answer to different mathematical problems.

Creating educational resources: The dataset can be used to create a wide variety of educational materials such as problem sets, worksheets, and quizzes for students studying mathematics.

Research in mathematical problem-solving strategies: Researchers and educators can analyze the dataset to identify common patterns or strategies employed in solving different types of mathematical problems. This analysis can help improve teaching methodologies and develop effective problem-solving techniques

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purpos...
e
Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...
paper.erudition.co.in
html
Updated Sep 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering, Competitive Exams | Erudition Paper [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Sep 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of Statistics (ST),Question Paper,Graduate Aptitude Test in Engineering,Competitive Exams
F
Filipino Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Filipino Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/filipino-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Filipino Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Filipino language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Filipino. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Filipino people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Filipino Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Filipino versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Filipino Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
u
Amazon Question and Answer Data
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon Question and Answer Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain 1.48 million question and answer pairs about products from Amazon.

Metadata includes

question and answer text

is the question binary (yes/no), and if so does it have a yes/no answer?

timestamps

product ID (to reference the review dataset)

Basic Statistics:

Questions: 1.48 million

Answers: 4,019,744

Labeled yes/no questions: 309,419

Number of unique products with questions: 191,185
F
Bengali Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bengali Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bengali-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Bengali Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Bengali language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Bengali. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bengali people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Bengali Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Bengali versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bengali Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
e
2021
paper.erudition.co.in
html
Updated Sep 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). 2021 [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Sep 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of year 2021 of Statistics, Question Paper , Graduate Aptitude Test in Engineering
DSA-Questions-Dataset
kaggle.com
Updated Sep 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
inductive_anks (2023). DSA-Questions-Dataset [Dataset]. https://www.kaggle.com/datasets/inductiveanks/dsa-questions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
inductive_anks
Description
Unlock the power of data-driven problem-solving with our comprehensive dataset featuring Data Structures and Algorithms (DSA) questions sourced from the renowned Geeks for Geeks platform. This dataset serves as a valuable resource for data enthusiasts, coding enthusiasts, educators, and researchers, offering a rich collection of questions that span various levels of complexity.

Key Attributes:

Question Name: Explore a diverse range of DSA questions, each carefully documented with a unique name for easy identification and reference.

Difficulty Level: Gain insights into the complexity of each question, categorized into difficulty levels to match your proficiency and learning goals.

Total Submissions: Understand the popularity of questions by examining the total number of submissions they've received, reflecting their challenge and appeal.

Accuracy: Evaluate the effectiveness of your problem-solving skills or algorithms by considering the accuracy rates associated with each question.

Company Tags: Discover the real-world applicability of DSA concepts with company tags, indicating the relevance of questions to specific tech industry employers.

Whether you're a data scientist, competitive programmer, or educator seeking to enhance algorithmic thinking, this dataset empowers you to:

Benchmark your skills against a vast array of DSA questions. Analyze question difficulty trends. Investigate the accuracy of solutions across different problems. Explore the intersection of DSA concepts with specific industry demands.

Unleash your coding potential, conduct in-depth analyses, and improve your problem-solving abilities with the Geeks for Geeks DSA Questions dataset. Dive into the world of algorithms, sharpen your skills, and embark on a journey of continuous learning and growth.
O
Pre-bid Question and Answer Packet
data.texas.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
application/rdfxml +5
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TxDOT (2025). Pre-bid Question and Answer Packet [Dataset]. https://data.texas.gov/dataset/Pre-bid-Question-and-Answer-Packet/8ywq-yvw9
Explore at:
csv, json, tsv, application/rdfxml, xml, application/rssxmlAvailable download formats
Dataset updated
Sep 11, 2025
Dataset authored and provided by
TxDOT
Description
The Pre-bid Q&A Packet dataset displays detailed questions and answers submitted before letting, including pre-bid questions, submission date and time, responder details, and other project information.

Its contents are refreshed daily with data from TxDOT’s transportation program management system, TXDOTCONNECT. The Project Information dataset includes all questions submitted by vendors and the responses provided by TxDOT.
Share of customers by search engine usage to answer questions U.S.&...
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of customers by search engine usage to answer questions U.S.& worldwide 2017 [Dataset]. https://www.statista.com/statistics/810420/customer-service-search-engine-usage-to-answer-questions/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide, United States
Description
This survey shows the share of customers in the U.S. and worldwide by if they ever used a search engine to try and find a response to a customer service question as of 2017. During the survey, ** percent of respondents from the United States stated that they have used a search engine to try and find a response to a customer service question.
e
2019
paper.erudition.co.in
html
Updated Sep 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). 2019 [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Sep 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of year 2019 of Statistics, Question Paper , Graduate Aptitude Test in Engineering
Take the Test Sample Questions from OECD's PISA Assessments
catalog.data.gov
gimi9.com
+1more
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). Take the Test Sample Questions from OECD's PISA Assessments [Dataset]. https://catalog.data.gov/dataset/take-the-test-sample-questions-from-oecds-pisa-assessments
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
United States Department of Statehttp://state.gov/
Description
What does PISA actually assess? This book presents all the publicly available questions from the PISA surveys. Some of these questions were used in the PISA 2000, 2003 and 2006 surveys and others were used in developing and trying out the assessment. After a brief introduction to the PISA assessment, the book presents three chapters, including PISA questions for the reading, mathematics and science tests, respectively. Each chapter presents an overview of what exactly the questions assess. The second section of each chapter presents questions which were used in the PISA 2000, 2003 and 2006 surveys, that is, the actual PISA tests for which results were published. The third section presents questions used in trying out the assessment. Although these questions were not used in the PISA 2000, 2003 and 2006 surveys, they are nevertheless illustrative of the kind of question PISA uses. The final section shows all the answers, along with brief comments on each question.
Z
Toloka Visual Question Answering Dataset
data.niaid.nih.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ustalov, Dmitry (2023). Toloka Visual Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057740
Explore at:
Dataset updated
Oct 10, 2023
Dataset authored and provided by
Ustalov, Dmitry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.

Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition, but without any ground truth labels. After the end of the competition, public and private sets were released.

The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.

Column Type Description image string URL of an image on a public content delivery network width integer image width height integer image height left integer bounding box coordinate: left top integer bounding box coordinate: top right integer bounding box coordinate: right bottom integer bounding box coordinate: bottom question string question in English

This upload also contains a ZIP file with the images from MS COCO.
Computer Science Theory QA Dataset
kaggle.com
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujtaba Mateen (2023). Computer Science Theory QA Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5333319
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5333319
Dataset updated
Apr 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mujtaba Mateen
Description
This comprehensive dataset contains a wide range of theoretical questions related to computer science, covering various domains such as operating systems, machine learning, software engineering, computer architecture and design, data structures, and algorithms. The questions are carefully curated to encompass a diverse set of topics, including hardware and software concepts, and are designed to challenge and enhance the knowledge of individuals interested in the computer science field.

The dataset is specifically tailored for training a chatbot or a question-answering system, with a focus on providing accurate and informative answers to technical questions. The questions cover a broad spectrum of complexity, ranging from basic to advanced, and are aimed at assisting users in gaining a deeper understanding of computer science concepts. Whether it's preparing for technical interviews or exams, or simply seeking guidance in the computer science field, this dataset can be a valuable resource for users looking to improve their knowledge and expertise.
F
Tamil Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Tamil Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/tamil-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Tamil Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Tamil language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Tamil. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Tamil people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Tamil Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Tamil versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Tamil Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
e
Introduction to Statistics & Probability
paper.erudition.co.in
html
Updated Feb 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2022). Introduction to Statistics & Probability [Dataset]. https://paper.erudition.co.in/makaut/master-of-computer-applications-2-years/2/numerical-and-statistical-analysis
Explore at:
htmlAvailable download formats
Dataset updated
Feb 6, 2022
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Introduction to Statistics & Probability of Numerical and Statistical Analysis, 2nd Semester , Master of Computer Applications (2 Years)
Data Indicating that Trivia Crack answers has Bias
figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antony Williams (2023). Data Indicating that Trivia Crack answers has Bias [Dataset]. http://doi.org/10.6084/m9.figshare.5902033.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5902033.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Antony Williams
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The online game and mobile App "Trivia Crack" has built-in bias - "All answers are not created equal". Multiple data sets were gathered totaling 768 questions gathered over a period of 3 weeks. The data indicates that when the user has no idea what the answer is the best choice is to select position 1 (top) in the answer set as the answers are biased. This has been proven with analysis as described here: http://www.chemconnector.com/?p=3482
Question-Answer Dataset
kaggle.com
Updated Sep 28, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachael Tatman (2017). Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/rtatman/questionanswer-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rachael Tatman
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Context:

Being able to automatically answer questions accurately remains a difficult problem in natural language processing. This dataset has everything you need to try your own hand at this task. Can you correctly generate the answer to questions given the Wikipedia article text the question was originally generated from?

Content:

There are three question files, one for each year of students: S08, S09, and S10, as well as 690,000 words worth of cleaned text from Wikipedia that was used to generate the questions.

The "question_answer_pairs.txt" files contain both the questions and answers. The columns in this file are as follows:

ArticleTitle is the name of the Wikipedia article from which questions and answers initially came.

Question is the question.

Answer is the answer.

DifficultyFromQuestioner is the prescribed difficulty rating for the question as given to the question-writer.

DifficultyFromAnswerer is a difficulty rating assigned by the individual who evaluated and answered the question, which may differ from the difficulty in field 4.

ArticleFile is the name of the file with the relevant article

Questions that were judged to be poor were discarded from this data set.

There are frequently multiple lines with the same question, which appear if those questions were answered by multiple individuals.

Acknowledgements:

These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010. It is released here under CC BY_SA 3.0. Please cite this paper if you write any papers involving the use of the data above:

Smith, N. A., Heilman, M., & Hwa, R. (2008, September). Question generation as a competitive undergraduate course project. In Proceedings of the NSF Workshop on the Question Generation Shared Task and Evaluation Challenge.

You may also like:

Question-Answer Jokes: Jokes of the question-answer form from Reddit's r/jokes

Stanford Question Answering Dataset: New Reading Comprehension Dataset on 100,000+ Question-Answer Pairs

Question Pairs Dataset: Can you identify duplicate questions?

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). Bahasa Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-question-answer-text-dataset

Bahasa Open Ended Question Answer Text Dataset

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by

FutureBeeAI

Description

What’s Included

The Bahasa Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Bahasa language, advancing the field of artificial intelligence.

Dataset Content:

This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Bahasa. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

Question Diversity:

To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

Answer Formats:

To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

Data Format and Annotation Details:

This fully labeled Bahasa Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

Quality and Accuracy:

The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

Both the question and answers in Bahasa are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

Continuous Updates and Customization:

The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

License:

The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

Clear search

Close search

Google apps

Main menu

Bahasa Open Ended Question Answer Text Dataset

What’s Included

Share of questions answered by AI models in SimpleQA benchmark 2025

Dataset of books called 101 toughest interview questions : -and answers that...

Mathematical Problems Dataset: Various

Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...

Filipino Closed Ended Question Answer Text Dataset

What’s Included

Amazon Question and Answer Data

Bengali Closed Ended Question Answer Text Dataset

What’s Included

2021

DSA-Questions-Dataset

Pre-bid Question and Answer Packet

Share of customers by search engine usage to answer questions U.S.&...

2019

Take the Test Sample Questions from OECD's PISA Assessments

Toloka Visual Question Answering Dataset

Computer Science Theory QA Dataset

Tamil Closed Ended Question Answer Text Dataset

What’s Included

Introduction to Statistics & Probability

Data Indicating that Trivia Crack answers has Bias

Question-Answer Dataset

Context:

Content:

Acknowledgements:

You may also like:

Bahasa Open Ended Question Answer Text DatasetSee More Versions

What’s Included

Bahasa Open Ended Question Answer Text Dataset