100+ datasets found

F
Bahasa Open Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Bahasa Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Bahasa language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Bahasa. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Bahasa Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Bahasa are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
F
Filipino Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Filipino Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/filipino-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Filipino Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Filipino language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Filipino. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Filipino people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Filipino Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Filipino versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Filipino Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
g
Data from: HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question...
hotpotqa.github.io
explagraphs.github.io
+1more
json
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carnegie Mellon University, Stanford University, Université de Montréal (2024). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Dataset]. https://hotpotqa.github.io/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Carnegie Mellon University, Stanford University, Université de Montréal
Description
HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems built based on Wikipedia.
F
Bahasa Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Bahasa Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Bahasa language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Bahasa. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Bahasa Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Bahasa versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
w
Dataset of books called 101 toughest interview questions : -and answers that...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called 101 toughest interview questions : -and answers that win the job! [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=101+toughest+interview+questions+%3A+-and+answers+that+win+the+job%21
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is 101 toughest interview questions : -and answers that win the job!. It features 7 columns including author, publication date, language, and book publisher.
h
Kaggle-post-and-comments-question-answer-topic
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duverne Mathieu, Kaggle-post-and-comments-question-answer-topic [Dataset]. https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Duverne Mathieu
Description
This is a dataset containing 10,000 posts from Kaggle and 60,000 comments related to those posts in the question-answer topic.

Data Fields kaggle_post

'pseudo', The question authors. 'title', Title of the Post. 'question', The question's body. 'vote', Voting on Kaggle is similar to liking. 'medal', I will share with you the Kaggle medal system, which can be found at https://www.kaggle.com/progression. The system awards medals to users based on… See the full description on the dataset page: https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic.
F
Tamil Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Tamil Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/tamil-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Tamil Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Tamil language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Tamil. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Tamil people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Tamil Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Tamil versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Tamil Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
Piaf — The French-language dataset of Questions-Answers
data.europa.eu
csv, json, plain text
Updated Jun 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Etalab (2022). Piaf — The French-language dataset of Questions-Answers [Dataset]. https://data.europa.eu/data/datasets/5e83c3ed38f46c1808801fbb?locale=en
Explore at:
csv(1014663), plain text(816), json(4744747), json(2834209)Available download formats
Dataset updated
Jun 19, 2022
Dataset authored and provided by
Etalab
Area covered
France, French
Description
Piaf, build an open French-language dataset for AI

The use of artificial intelligence in public action is often identified as an opportunity to query documentary texts and produce automatic QR tools for users. Questioning the work code in natural language, providing a conversational agent for a given service, developing efficient search engines, improving knowledge management, all of which require a body of quality training data in order to develop Q & A algorithms. The PIAF dataset is a public and open Francophone training dataset that allows to train these algorithms.

Inspired by Squad, the well-known dataset of English QR, we had the ambition to build a similar dataset that would be open to all. The protocol we followed is very similar to that of the first version of Squad (Squad v1.1). However, some changes had to be made to adapt to the characteristics of the French Wikipedia. Another big difference is that we do not employ micro-workers via crowd-sourcing platforms.

After several months of annotation, we have a robust and free annotation platform, a sufficient amount of annotations and a well-founded and innovative community animation and collaborative participation approach within the French administration.

PIAF: a shared tool of the IA Lab

In March 2018, France launched its national strategy for artificial intelligence. Piloted within the Interdepartmental Digital Branch, this strategy has three components: research, the economy and public transformation.

Given that the data policy is a major focus of the development of artificial intelligence, the Etalab mission is piloting the establishment of an interministerial “Lab IA”, whose mission is to accelerate the deployment of AI in administrations via 3 main activities:

Build a core team to internalise skills and expertise around AI

Supporting AI projects in administrations through calls for expressions of interest

Co-build shared tools that can be used as openly as possible

The PIAF project is one of the shared tools of the IA Lab.

Descriptive of the data made available

The dataset follows the format of Squad v1.1. PIAFv1.2 contains 9225 Q & A peers. This is a JSON file. A text file illustrating the schema is included below. This file can be used to generate and evaluate Question-Response templates. For example, following these instructions.

Thanks to the 500 contributors!

We deeply thank our contributors who have made this project live on a voluntary basis to this day.

Links

Information on the protocol followed, the project news, the annotation platform and the related code are here:

https://piaf.etalab.studio/

https://piaf.etalab.studio/actualites.html

https://github.com/etalab/piaf

https://github.com/etalab-ia/piaf-code
Data from: quac
huggingface.co
tensorflow.org
+1more
Updated Dec 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2020). quac [Dataset]. https://huggingface.co/datasets/allenai/quac
Explore at:
Dataset updated
Dec 12, 2020
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.
f
Autism Spectrum Disorder and Asperger Syndrome Question Answering Dataset...
figshare.com
bin
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victoria Firsanova (2023). Autism Spectrum Disorder and Asperger Syndrome Question Answering Dataset 1.0 [Dataset]. http://doi.org/10.6084/m9.figshare.13295831.v19
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13295831.v19
Dataset updated
Sep 13, 2023
Dataset provided by
figshare
Authors
Victoria Firsanova
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
RUSПоследнее обновление: 13/09/2023Набор данных предназначен для разработки русскоязычных диалоговых систем (чат-ботов, вопросно-ответных систем и т. д.) о расстройствах аутистического спектра. Источник текстов: https://aspergers.ruПроект реализуется победителем конкурса «Практики личной филантропии и альтруизма» Благотворительного фонда Владимира Потанина.75% данных собраны с помощью платформы Toloka.Состав набора данных:1. original.json: оригинальная версия датасета2. multiple.json: версия датасета с несколькими вариантами ответа3. short.json: версия датасета с укороченными ответами4. half_sized.json: версия датасета содержит 50% собранных данных5. no_impossible.json: версия содержит только релевантные вопросы7. age_dataset.tsv: набор данных для определения возраста пользователя (можно использовать для кастомизации моделей)ENGA dataset for question-answering used for building an informational Russian language chatbot for the inclusion of people with autism spectrum disorder and Asperger syndrome in particular, based on data from the following website: https://aspergers.ru.The detailed dataset statistics:ParameterDescriptionThe number of QA pairs4,138The number of irrelevant questions352The average question length53 symbols / 8 wordsThe average answer length141 symbols / 20 wordsThe average reading paragraph length453 symbols / 63 wordsMax question length226 symbols / 32 wordsMax answer length555 symbols / 85 wordsMax reading paragraph length551 symbols / 94 wordsMin question length9 symbols / 2 wordsMin answer length5 symbols / 1 wordsMin reading paragraph length144 symbols / 17 wordsThe dataset has several versions:1. Original version2. Half-sized version (50% of the original data)3. No impossible version (a version without irrelevant/impossible questions)4. Short version (a version with shorterned answers)5. Multiple version (a version with several answers, all the other versions contain only one answer to each question)
CourseKata Dataset Items (QuestionTypes)
kaggle.com
Updated Apr 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagan Karnati (2024). CourseKata Dataset Items (QuestionTypes) [Dataset]. https://www.kaggle.com/datasets/gagankarnati/coursekata-dataset-items-questiontypes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gagan Karnati
Description
CourseKata is a platform that creates and publishes a series of e-books for introductory statistics and data science classes that utilize demonstrated learning strategies to help students learn statistics and data science. The developers of CourseKata, Jim Stigler (UCLA) and Ji Son (Cal State Los Angeles) and their team, are cognitive psychologists interested in improving statistics learning by examining students' interactions with online interactive textbooks. Traditionally, much of the research in how students learn is done in a 1-hour lab or through small-scale interviews with students. CourseKata offers the opportunity to peek into the actions, responses, and choices of thousands of students as they are engaged in learning the interrelated concepts and skills of statistics and coding in R over many weeks or months in real classes.

items.csv (1335 X 19) Each row contains information about a particular question (although it does not provide the prompt). The item to which a question belongs is included. All items/questions are represented. Use this file to go deeper into particular questions that students encounter in the course.

Questions are grouped into items (item_id). An item can be one of three item_type 's: code, learnosity or learnosity-activity (the distinction between learnosity and learnosity-activity is not important). Code items are a single question and ask for R code as a response. (Responses can be seen in responses.csv.) Learnosity-activities and learnosity items are collections of one or more questions that can be of a variety of lrn_type's: ● association ● choicematrix ● clozeassociation ● formulaV2 ● imageclozeassociation ● mcq ● plaintext ● shorttext ● sortlist

Examples of these question types are provided at the end of this document.

The level of detail made available to you in the responses file depends on the lrn_type. For example, for multiple choice questions (mcq), you can find the options in the responses file in the columns labeled lrn_option_0 through lrn_option_11, and you can see the chosen option in the results variable.

Assessment Types In general, assessments, such as the items and questions included in CourseKata, can be used for two purposes. Formative assessments are meant to provide feedback to the student (and instructor), or to serve as a learning aid to help prompt students improve memory and deepen their understanding. Summative assessments are meant to provide a summary of a student's understanding, often for use in assigning a grade. For example, most midterms and final exams that you've taken are summative assessments.

The vast majority of items in CourseKata should be treated as formative assessments. The exceptions are the end-of-chapter Review questions, which can be thought of as summative. The mean number of correct answers for end-of-chapter review questions is provided within the checkpoints file. You might see that some pages have the word "Quiz" or "Exam" or "Midterm" in them. Results from these items and responses to them are not provided to us in this data set.
e
Statistics
paper.erudition.co.in
html
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Statistics [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper
Explore at:
htmlAvailable download formats
Dataset updated
Jul 28, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Get Exam Question Paper Solutions of Statistics and many more.
h
sat-questions-and-answers-for-llm
huggingface.co
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). sat-questions-and-answers-for-llm [Dataset]. https://huggingface.co/datasets/TrainingDataPro/sat-questions-and-answers-for-llm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 20, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
SAT History Questions and Answers 🏛️ - Text Classification Dataset

This dataset contains a collection of questions and answers for the SAT Subject Test in World History and US History. Each question is accompanied by a corresponding answers and the correct response. The dataset includes questions from various topics, time periods, and regions on both World History and US History.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/sat-questions-and-answers-for-llm.
e
Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...
paper.erudition.co.in
html
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering, Competitive Exams | Erudition Paper [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of Statistics (ST),Question Paper,Graduate Aptitude Test in Engineering,Competitive Exams
d
Clinical Questions Collection
catalog.data.gov
data.virginia.gov
+3more
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Clinical Questions Collection [Dataset]. https://catalog.data.gov/dataset/clinical-questions-collection-665af
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
National Library of Medicine
Description
The Clinical Questions Collection is a repository of questions that have been collected between 1991 – 2003 from healthcare providers in clinical settings across the country. The questions have been submitted by investigators who wish to share their data with other researchers. This dataset is no-longer updated with new content. The collection is used in developing approaches to clinical and consumer-health question answering, as well as researching information needs of clinicians and the language they use to express their information needs. All files are formatted in XML.
Thousands of Questions about Love
kaggle.com
Updated Nov 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Zubarev (2018). Thousands of Questions about Love [Dataset]. https://www.kaggle.com/bobazooba/thousands-of-questions-about-love/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Boris Zubarev
Description
Context

RUSSIAN LANGUAGE

This dataset collected from real answers to questions of the mail.ru service: https://otvet.mail.ru/
Category: Love

Content

Data Description

We have 82628 questions and 530511 answers!

{ "question": "string, which contains question", "comment": "string, sometimes question have comment better reveals the essence of the issue", "sub_category": "string of sub category", "author": "string contains author nickname", "author_rating": { "category": "string - type of author rating", "value": "string with rating" }, "answers": [{ "text": "string - text", "author_rating": "same to author_rating" }], "poll": [{ "text": "string - text", "score": "string - how many points scored this option" }]}

Answers or poll can be empty. Sometimes user chooses to make a poll and sets the options themselves. First answer is the best answer.

Inspiration

This data can be helpful to create a chit-chat dialogue system. You can use ranking like architecture like DSSM.

Ranking Architectures

Learning Deep Structured Semantic Models for Web Search using Clickthrough Data (DSSM) https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf

Applying Deep Learning to answer selection: a study and an open task https://arxiv.org/pdf/1508.01585.pdf

LSTM-Based Deep Learning Models for Nonfactoid answer selection https://arxiv.org/pdf/1511.04108.pdf

Universal Sentence Encoder https://arxiv.org/pdf/1803.11175.pdf
Data Indicating that Trivia Crack answers has Bias
figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antony Williams (2023). Data Indicating that Trivia Crack answers has Bias [Dataset]. http://doi.org/10.6084/m9.figshare.5902033.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5902033.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Antony Williams
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The online game and mobile App "Trivia Crack" has built-in bias - "All answers are not created equal". Multiple data sets were gathered totaling 768 questions gathered over a period of 3 weeks. The data indicates that when the user has no idea what the answer is the best choice is to select position 1 (top) in the answer set as the answers are biased. This has been proven with analysis as described here: http://www.chemconnector.com/?p=3482
Clotho-AQA dataset
zenodo.org
explore.openaire.eu
+1more
csv, txt, zip
Updated Apr 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Lipping; Parthasaarathy Sudarsanam; Konstantinos Drossos; Konstantinos Drossos; Tuomas Virtanen; Tuomas Virtanen; Samuel Lipping; Parthasaarathy Sudarsanam (2022). Clotho-AQA dataset [Dataset]. http://doi.org/10.5281/zenodo.6473207
Explore at:
csv, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6473207
Dataset updated
Apr 22, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuel Lipping; Parthasaarathy Sudarsanam; Konstantinos Drossos; Konstantinos Drossos; Tuomas Virtanen; Tuomas Virtanen; Samuel Lipping; Parthasaarathy Sudarsanam
Description
Clotho-AQA is an audio question-answering dataset consisting of 1991 audio samples taken from Clotho dataset [1]. Each audio sample has 6 associated questions collected through crowdsourcing. For each question, the answers are provided by three different annotators making a total of 35,838 question-answer pairs. For each audio sample, 4 questions are designed to be answered with 'yes' or 'no', while the remaining two questions are designed to be answered in a single word. More details about the data collection process and data splitting process can be found in our following paper.

S. Lipping, P. Sudarsanam, K. Drossos, T. Virtanen ‘Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering.’ The paper is available online at 2204.09634.pdf (arxiv.org)

If you use the Clotho-AQA dataset, please cite the paper mentioned above. A sample baseline model to use the Clotho-AQA dataset can be found at partha2409/AquaNet (github.com)

To use the dataset,

• Download and extract ‘audio_files.zip’. This contains all the 1991 audio samples in the dataset.

• Download ‘clotho_aqa_train.csv’, ‘clotho_aqa_val.csv’, and ‘clotho_aqa_test.csv’. These files contain the train, validation, and test splits, respectively. They contain the audio file name, questions, answers, and confidence scores provided by the annotators.

License:

The audio files in the archive ‘audio_files.zip’ are under the corresponding licenses (mostly CreativeCommons with attribution) of Freesound [2] platform, mentioned explicitly in the CSV file ’clotho_aqa_metadata.csv’ for each of the audio files. That is, each audio file in the archive is listed in the CSV file with meta-data. The meta-data for each file are:

• File name

• Keywords

• URL for the original audio file

• Start and ending samples for the excerpt that is used in the Clotho dataset

• Uploader/user in the Freesound platform (manufacturer)

• Link to the license of the file.

The questions and answers in the files:

• clotho_aqa_train.csv

• clotho_aqa_val.csv

• clotho_aqa_test.csv

are under the MIT license, described in the LICENSE file.

References:

[1] K. Drossos, S. Lipping and T. Virtanen, "Clotho: An Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736- 740, doi: 10.1109/ICASSP40776.2020.9052990.

[2] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
e
2021
paper.erudition.co.in
html
Updated Jul 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). 2021 [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of year 2021 of Statistics, Question Paper , Graduate Aptitude Test in Engineering
Trojan Detection Software Challenge - nlp-question-answering-sep2021-holdout...
catalog.data.gov
data.nist.gov
Updated Sep 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Trojan Detection Software Challenge - nlp-question-answering-sep2021-holdout [Dataset]. https://catalog.data.gov/dataset/trojan-detection-software-challenge-round-8-holdout-dataset
Explore at:
Dataset updated
Sep 30, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Round 8 Holdout DatasetThis is the training data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform extractive question answering on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 360 QA AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). Bahasa Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-question-answer-text-dataset

Bahasa Open Ended Question Answer Text Dataset

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by

FutureBeeAI

Description

What’s Included

The Bahasa Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Bahasa language, advancing the field of artificial intelligence.

Dataset Content:

This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Bahasa. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

Question Diversity:

To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

Answer Formats:

To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

Data Format and Annotation Details:

This fully labeled Bahasa Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

Quality and Accuracy:

The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

Both the question and answers in Bahasa are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

Continuous Updates and Customization:

The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

License:

The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

Clear search

Close search

Google apps

Main menu

Bahasa Open Ended Question Answer Text Dataset

What’s Included

Filipino Closed Ended Question Answer Text Dataset

What’s Included

Data from: HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question...

Bahasa Closed Ended Question Answer Text Dataset

What’s Included

Dataset of books called 101 toughest interview questions : -and answers that...

Kaggle-post-and-comments-question-answer-topic

Tamil Closed Ended Question Answer Text Dataset

What’s Included

Piaf — The French-language dataset of Questions-Answers

Piaf, build an open French-language dataset for AI

PIAF: a shared tool of the IA Lab

Descriptive of the data made available

Thanks to the 500 contributors!

Links

Data from: quac

Autism Spectrum Disorder and Asperger Syndrome Question Answering Dataset...

CourseKata Dataset Items (QuestionTypes)

Statistics

sat-questions-and-answers-for-llm

Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...

Clinical Questions Collection

Thousands of Questions about Love

Context

Content

Data Description

Inspiration

Ranking Architectures

Data Indicating that Trivia Crack answers has Bias

Clotho-AQA dataset

2021

Trojan Detection Software Challenge - nlp-question-answering-sep2021-holdout...

Bahasa Open Ended Question Answer Text DatasetSee More Versions

What’s Included

Bahasa Open Ended Question Answer Text Dataset