100+ datasets found

e
Statistics
paper.erudition.co.in
html
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). Statistics [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper
Explore at:
htmlAvailable download formats
Dataset updated
Jun 9, 2020
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Get Exam Question Paper Solutions of Statistics and many more.

Data from: SPIQA: A Dataset for Multimodal Question Answering on Scientific...

paperswithcode.com

Updated Jul 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Shraman Pramanick; Rama Chellappa; Subhashini Venugopalan (2024). SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Dataset [Dataset]. https://paperswithcode.com/dataset/spiqa-a-dataset-for-multimodal-question

Explore at:

Dataset updated

Jul 12, 2024

Authors

Shraman Pramanick; Rama Chellappa; Subhashini Venugopalan

Description

SPIQA Dataset Card Dataset Details Dataset Name: SPIQA (Scientific Paper Image Question Answering)

Paper: SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Github: SPIQA eval and metrics code repo

Dataset Summary: SPIQA is a large-scale and challenging QA dataset focused on figures, tables, and text paragraphs from scientific research papers in various computer science domains. The figures cover a wide variety of plots, charts, schematic diagrams, result visualization etc. The dataset is the result of a meticulous curation process, leveraging the breadth of expertise and ability of multimodal large language models (MLLMs) to understand figures. We employ both automatic and manual curation to ensure the highest level of quality and reliability. SPIQA consists of more than 270K questions divided into training, validation, and three different evaluation splits. The purpose of the dataset is to evaluate the ability of Large Multimodal Models to comprehend complex figures and tables with the textual paragraphs of scientific papers.

This Data Card describes the structure of the SPIQA dataset, divided into training, validation, and three different evaluation splits. The test-B and test-C splits are filtered from the QASA and QASPER datasets and contain human-written QAs. We collect all scientific papers published at top computer science conferences between 2018 and 2023 from arXiv.

If you have any comments or questions, reach out to Shraman Pramanick or Subhashini Venugopalan.

Supported Tasks: - Direct QA with figures and tables - Direct QA with full paper - CoT QA (retrieval of helpful figures, tables; then answering)

Language: English

Release Date: SPIQA is released in June 2024.

Data Splits The statistics of different splits of SPIQA is shown below.

Split	Papers	Questions	Schematics	Plots & Charts	Visualizations	Other figures	Tables
Train	25,459	262,524	44,008	70,041	27,297	6,450	114,728
Val	200	2,085	360	582	173	55	915
test-A	118	666	154	301	131	95	434
test-B	65	228	147	156	133	17	341
test-C	314	493	415	404	26	66	1,332

Dataset Structure The contents of this dataset card are structured as follows:

bash SPIQA ├── SPIQA_train_val_test-A_extracted_paragraphs.zip ├── Extracted textual paragraphs from the papers in SPIQA train, val and test-A splits ├── SPIQA_train_val_test-A_raw_tex.zip └── The raw tex files from the papers in SPIQA train, val and test-A splits. These files are not required to reproduce our results; we open-source them for future research. ├── train_val ├── SPIQA_train_val_Images.zip └── Full resolution figures and tables from the papers in SPIQA train, val splits ├── SPIQA_train.json └── SPIQA train metadata ├── SPIQA_val.json └── SPIQA val metadata ├── test-A ├── SPIQA_testA_Images.zip └── Full resolution figures and tables from the papers in SPIQA test-A split ├── SPIQA_testA_Images_224px.zip └── 224px figures and tables from the papers in SPIQA test-A split ├── SPIQA_testA.json └── SPIQA test-A metadata ├── test-B ├── SPIQA_testB_Images.zip └── Full resolution figures and tables from the papers in SPIQA test-B split ├── SPIQA_testB_Images_224px.zip └── 224px figures and tables from the papers in SPIQA test-B split ├── SPIQA_testB.json └── SPIQA test-B metadata ├── test-C ├── SPIQA_testC_Images.zip └── Full resolution figures and tables from the papers in SPIQA test-C split ├── SPIQA_testC_Images_224px.zip └── 224px figures and tables from the papers in SPIQA test-C split ├── SPIQA_testC.json └── SPIQA test-C metadata

The testA_data_viewer.json file is only for viewing a portion of the data on HuggingFace viewer to get a quick sense of the metadata.

Metadata Structure The metadata for every split is provided as dictionary where the keys are arXiv IDs of the papers. The primary contents of each dictionary item are:

arXiv ID Semantic scholar ID (for test-B) Figures and tables Name of the png file Caption Content type (figure or table) Figure type (schematic, plot, photo (visualization), others)

QAs Question, answer and rationale Reference figures and tables Textual evidence (for test-B and test-C)

Abstract and full paper text (for test-B and test-C; full paper for other splits are provided as a zip)

Dataset Use and Starter Snippets Downloading the Dataset to Local We recommend the users to download the metadata and images to their local machine.

Download the whole dataset (all splits). bash from huggingface_hub import snapshot_download snapshot_download(repo_id="google/spiqa", repo_type="dataset", local_dir='.') ### Mention the local directory path

Download specific file. bash from huggingface_hub import hf_hub_download hf_hub_download(repo_id="google/spiqa", filename="test-A/SPIQA_testA.json", repo_type="dataset", local_dir='.') ### Mention the local directory path

Questions and Answers from a Specific Paper in test-A bash import json testA_metadata = json.load(open('test-A/SPIQA_testA.json', 'r')) paper_id = '1702.03584v3' print(testA_metadata[paper_id]['qa'])

Questions and Answers from a Specific Paper in test-B bash import json testB_metadata = json.load(open('test-B/SPIQA_testB.json', 'r')) paper_id = '1707.07012' print(testB_metadata[paper_id]['question']) ## Questions print(testB_metadata[paper_id]['composition']) ## Answers

Questions and Answers from a Specific Paper in test-C bash import json testC_metadata = json.load(open('test-C/SPIQA_testC.json', 'r')) paper_id = '1808.08780' print(testC_metadata[paper_id]['question']) ## Questions print(testC_metadata[paper_id]['answer']) ## Answers

Annotation Overview Questions and answers for the SPIQA train, validation, and test-A sets were machine-generated. Additionally, the SPIQA test-A set was manually filtered and curated. Questions in the SPIQA test-B set are collected from the QASA dataset, while those in the SPIQA test-C set are from the QASPER dataset. Answering the questions in all splits requires holistic understanding of figures and tables with related text from the scientific papers.

Personal and Sensitive Information We are not aware of any personal or sensitive information in the dataset.

Licensing Information CC BY 4.0

Citation Information bibtex @article{pramanick2024spiqa, title={SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers}, author={Pramanick, Shraman and Chellappa, Rama and Venugopalan, Subhashini}, journal={NeurIPS}, year={2024} }

e
Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...
paper.erudition.co.in
html
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering, Competitive Exams | Erudition Paper [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of Statistics (ST),Question Paper,Graduate Aptitude Test in Engineering,Competitive Exams
e
2021
paper.erudition.co.in
html
Updated Jun 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). 2021 [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper/statistics
Explore at:
htmlAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of year 2021 of Statistics, Question Paper , Graduate Aptitude Test in Engineering
e
Aerospace Engineering
paper.erudition.co.in
html
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). Aerospace Engineering [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper
Explore at:
htmlAvailable download formats
Dataset updated
Jun 9, 2020
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Get Exam Question Paper Solutions of Aerospace Engineering and many more.
w
Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina...
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Jan 30, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State Agency for Statistics (BHAS) (2020). Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina [Dataset]. https://microdata.worldbank.org/index.php/catalog/67
Explore at:
Dataset updated
Jan 30, 2020
Dataset provided by
Federation of BiH Institute of Statistics (FIS)
State Agency for Statistics (BHAS)
Republika Srpska Institute of Statistics (RSIS)
Time period covered
2003
Area covered
Bosnia and Herzegovina
Description
Abstract

In 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:

To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.

To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.

To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.

The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel
- Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003

The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change.

Geographic coverage

National coverage. Domains: Urban/rural/mixed; Federation; Republic

Kind of data

Sample survey data [ssd]

Sampling procedure

The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.

Panel design

Eligibility for inclusion

The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.

Following rules

The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.

Definition of 'out-of-scope'

It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:

i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.

ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.

iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.

iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.

Mode of data collection

Face-to-face [f2f]

Research instrument

Questionnaire design

Approximately 90% of the questionnaire (Annex B) is based on the Wave 2 questionnaire, carrying forward core measures that are needed to measure change over time. The questionnaire was widely circulated and changes were made as a result of comments received.

Pretesting

In order to undertake a longitudinal test the Wave 2 pretest sample was used. The Control Forms and Advance letters were generated from an Access database containing details of ten households in Sarajevo and fourteen in Banja Luka. The pretest was undertaken from March 24-April 4 and resulted in 24 households (51 individuals) successfully interviewed. One mover household was successfully traced and interviewed.
In order to test the questionnaire under the hardest circumstances a briefing was not held. A list of the main questionnaire changes was given to experienced interviewers.

Issues arising from the pretest

Interviewers were asked to complete a Debriefing and Rating form. The debriefing form captured opinions on the following three issues:

General reaction to being re-interviewed. In some cases there was a wariness of being asked to participate again, some individuals asking “Why Me?” Interviewers did a good job of persuading people to take part, only one household refused and another asked to be removed from the sample next year. Having the same interviewer return to the same households was considered an advantage. Most respondents asked what was the benefit to them of taking part in the survey. This aspect was reemphasised in the Advance Letter, Respondent Report and training of the Wave 3 interviewers.

Length of the questionnaire. The average time of interview was 30 minutes. No problems were mentioned in relation to the timing, though interviewers noted that some respondents, particularly the elderly, tended to wonder off the point and that control was needed to bring them back to the questions in the questionnaire. One interviewer noted that the economic situation of many respondents seems to have got worse from the previous year and it was necessary to listen to respondents “stories” during the interview.

Confidentiality. No problems were mentioned in relation to confidentiality. Though interviewers mentioned it might be worth mentioning the new Statistics Law in the Advance letter. The Rating Form asked for details of specific questions that were unclear. These are described below with a description of the changes made.

Module 3. Q29-31 have been added to capture funds received for education, scholarships etc.

Module 4. Pretest respondents complained that the 6 questions on "Has your health limited you..." and the 16 on "in the last 7 days have you felt depressed” etc were too many. These were reduced by half (Q38-Q48). The LSMS data was examined and those questions where variability between the answers was widest were chosen.

Module 5. The new employment questions (Q42-Q44) worked well and have been kept in the main questionnaire.

Module 7. There were no problems reported with adding the credit questions (Q28-Q36)

Module 9. SIG recommended that some of Questions 1-12 were relevant only to those aged over 18 so additional skips have been added. Some respondents complained the questionnaire was boring. To try and overcome
j
Data from: Question Answering (QA) for bridge design
jstagedata.jst.go.jp
figshare.com
txt
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riku Ogata; Junichi Okubo; Junichiro Fujii; Masazumi Amakata (2024). Question Answering (QA) for bridge design [Dataset]. http://doi.org/10.50915/data.jsceiii.25459144.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.50915/data.jsceiii.25459144.v2
Dataset updated
May 16, 2024
Dataset provided by
Japan Society of Civil Engineers
Authors
Riku Ogata; Junichi Okubo; Junichiro Fujii; Masazumi Amakata
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a small dataset created to evaluate the performance of Question Answering (QA) for large-scale language models in the field of civil engineering. The dataset is targeted at the field of bridge design, and was created using the document (Survey on the common issues of bridge design (2018-2019) (Technical Note of NILIM No.1162)) used in bridge design projects in Japan. The dataset consists of 50 pairs of QAs, where each pair consists of a question asking about the content of a document and an answer extracted from the document associated with the question.

Each column of the csv file shows the following data.

Column 1: ID of the QA Column 2: Referenced page (page number of the full pdf) Column 3: Question Column 4: Answer
P
Data from: Yahoo! Answers Dataset
paperswithcode.com
Updated Aug 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Zhang; Junbo Zhao; Yann Lecun (2020). Yahoo! Answers Dataset [Dataset]. https://paperswithcode.com/dataset/yahoo-answers
Explore at:
Dataset updated
Aug 18, 2020
Authors
Xiang Zhang; Junbo Zhao; Yann Lecun
Description
The Yahoo! Answers topic classification dataset is constructed using 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000 and testing samples 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.
P
Natural Questions Dataset
paperswithcode.com
opendatalab.com
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Kwiatkowski; Jennimaria Palomaki; Olivia Redfield; Michael Collins; Ankur Parikh; Chris Alberti; Danielle Epstein; Illia Polosukhin; Jacob Devlin; Kenton Lee; Kristina Toutanova; Llion Jones; Matthew Kelcey; Ming-Wei Chang; Andrew M. Dai; Jakob Uszkoreit; Quoc Le; Slav Petrov (2024). Natural Questions Dataset [Dataset]. https://paperswithcode.com/dataset/natural-questions
Explore at:
Dataset updated
Feb 16, 2024
Authors
Tom Kwiatkowski; Jennimaria Palomaki; Olivia Redfield; Michael Collins; Ankur Parikh; Chris Alberti; Danielle Epstein; Illia Polosukhin; Jacob Devlin; Kenton Lee; Kristina Toutanova; Llion Jones; Matthew Kelcey; Ming-Wei Chang; Andrew M. Dai; Jakob Uszkoreit; Quoc Le; Slav Petrov
Description
The Natural Questions corpus is a question answering dataset containing 307,373 training examples, 7,830 development examples, and 7,842 test examples. Each example is comprised of a google.com query and a corresponding Wikipedia page. Each Wikipedia page has a passage (or long answer) annotated on the page that answers the question and one or more short spans from the annotated passage containing the actual answer. The long and the short answer annotations can however be empty. If they are both empty, then there is no answer on the page at all. If the long answer annotation is non-empty, but the short answer annotation is empty, then the annotated passage answers the question but no explicit short answer could be found. Finally 1% of the documents have a passage annotated with a short answer that is “yes” or “no”, instead of a list of short spans.
e
Chemical Engineering
paper.erudition.co.in
html
Updated Jun 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). Chemical Engineering [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper
Explore at:
htmlAvailable download formats
Dataset updated
Jun 9, 2020
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Get Exam Question Paper Solutions of Chemical Engineering and many more.
o
QASPER: NLP Questions and Evidence
opendatabay.com
.undefined
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). QASPER: NLP Questions and Evidence [Dataset]. https://www.opendatabay.com/data/ai-ml/c030902d-7b02-48a2-b32f-8f7140dd1de7
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
QASPER: NLP Questions and Evidence Discovering Answers with Expertise By Huggingface Hub [source]

About this dataset QASPER is an incredible collection of over 5,000 questions and answers on a vast range of Natural Language Processing (NLP) papers -- all crowdsourced from experienced NLP practitioners. Each question in the dataset is written based only on the titles and abstracts of the corresponding paper, providing an insight into how the experts understood and parsed various materials. The answers to each query have been expertly enriched by evidence taken directly from the full text of each paper. Moreover, QASPER comes with carefully crafted fields that contain relevant information including ‘qas’ – questions and answers; ‘evidence’ – evidence provided for answering questions; title; abstract; figures_and_tables, and full_text. All this adds up to create a remarkable dataset for researchers looking to gain insights into how practitioners interpret NLP topics while providing effective validation when it comes to finding clear-cut solutions to problems encountered in existing literature

More Datasets For more datasets, click here.

Featured Notebooks 🚨 Your notebook can be here! 🚨! How to use the dataset This guide will provide instructions on how to use the QASPER dataset of Natural Language Processing (NLP) questions and evidence. The QASPER dataset contains 5,049 questions over 1,585 papers that has been crowdsourced by NLP practitioners. To get the most out of this dataset we will show you how to access the questions and evidence, as well as provide tips for getting started.

Step 1: Accessing the Dataset To access the data you can download it from Kaggle's website or through a code version control system like Github. Once downloaded, you will find five files in .csv format; two test data sets (test.csv and validation.csv), two train data sets (train-v2-0_lessons_only_.csv and trainv2-0_unsplit.csv) as well as one figure data set (figures_and_tables_.json). Each .csv file contains different datasets with columns representing titles, abstracts, full texts and Q&A fields with evidence for each paper mentioned in each row of each file respectively

**Step 2: Analyzing Your Data Sets ** Now would be a good time to explore your datasets using basic descriptive statistics or more advanced predictive analytics such as logistic regression or naive bayes models depending on what kind of analysis you would like to undertake with this dataset You can start simple by summarizing some basic crosstabs between any two variables comprise your dataset; titles abstracts etc.). As an example try correlating title lengths with certain number of words in their corresponding abstracts then check if there is anything worth investigating further

**Step 3: Define Your Research Questions & Perform Further Analysis ** Once satisfied with your initial exploration it is time to dig deeper into the underlying QR relationship among different variables comprising your main documents One way would be using text mining technologies such as topic modeling machine learning techniques or even automated processes that may help summarize any underlying patterns Yet another approach could involve filtering terms that are relevant per specific research hypothesis then process such terms via web crawlers search engines document similarity algorithms etc

Finally once all relevant parameters are defined analyzed performed searched it would make sense to draw preliminary connsusison linking them back together before conducting replicable tests ensuring reproducible results

Research Ideas Developing AI models to automatically generate questions and answers from paper titles and abstracts. Enhancing machine learning algorithms by combining the answers with the evidence provided in the dataset to find relationships between papers. Creating online forums for NLP practitioners that uses questions from this dataset to spark discussion within the community

License

CC0

Original Data Source: QASPER: NLP Questions and Evidence
P
MKQA Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shayne Longpre; Yi Lu; Joachim Daiber, MKQA Dataset [Dataset]. https://paperswithcode.com/dataset/mkqa
Explore at:
Authors
Shayne Longpre; Yi Lu; Joachim Daiber
Description
Multilingual Knowledge Questions and Answers (MKQA) is an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Answers are based on a language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering.
Clotho-AQA dataset
zenodo.org
explore.openaire.eu
+1more
csv, txt, zip
Updated Apr 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Lipping; Parthasaarathy Sudarsanam; Konstantinos Drossos; Konstantinos Drossos; Tuomas Virtanen; Tuomas Virtanen; Samuel Lipping; Parthasaarathy Sudarsanam (2022). Clotho-AQA dataset [Dataset]. http://doi.org/10.5281/zenodo.6473207
Explore at:
csv, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6473207
Dataset updated
Apr 22, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuel Lipping; Parthasaarathy Sudarsanam; Konstantinos Drossos; Konstantinos Drossos; Tuomas Virtanen; Tuomas Virtanen; Samuel Lipping; Parthasaarathy Sudarsanam
Description
Clotho-AQA is an audio question-answering dataset consisting of 1991 audio samples taken from Clotho dataset [1]. Each audio sample has 6 associated questions collected through crowdsourcing. For each question, the answers are provided by three different annotators making a total of 35,838 question-answer pairs. For each audio sample, 4 questions are designed to be answered with 'yes' or 'no', while the remaining two questions are designed to be answered in a single word. More details about the data collection process and data splitting process can be found in our following paper.

S. Lipping, P. Sudarsanam, K. Drossos, T. Virtanen ‘Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering.’ The paper is available online at 2204.09634.pdf (arxiv.org)

If you use the Clotho-AQA dataset, please cite the paper mentioned above. A sample baseline model to use the Clotho-AQA dataset can be found at partha2409/AquaNet (github.com)

To use the dataset,

• Download and extract ‘audio_files.zip’. This contains all the 1991 audio samples in the dataset.

• Download ‘clotho_aqa_train.csv’, ‘clotho_aqa_val.csv’, and ‘clotho_aqa_test.csv’. These files contain the train, validation, and test splits, respectively. They contain the audio file name, questions, answers, and confidence scores provided by the annotators.

License:

The audio files in the archive ‘audio_files.zip’ are under the corresponding licenses (mostly CreativeCommons with attribution) of Freesound [2] platform, mentioned explicitly in the CSV file ’clotho_aqa_metadata.csv’ for each of the audio files. That is, each audio file in the archive is listed in the CSV file with meta-data. The meta-data for each file are:

• File name

• Keywords

• URL for the original audio file

• Start and ending samples for the excerpt that is used in the Clotho dataset

• Uploader/user in the Freesound platform (manufacturer)

• Link to the license of the file.

The questions and answers in the files:

• clotho_aqa_train.csv

• clotho_aqa_val.csv

• clotho_aqa_test.csv

are under the MIT license, described in the LICENSE file.

References:

[1] K. Drossos, S. Lipping and T. Virtanen, "Clotho: An Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736- 740, doi: 10.1109/ICASSP40776.2020.9052990.

[2] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
f
nzqa_exam_questions_contextual_population_parameter_definitions - updated
auckland.figshare.com
csv
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Fergusson; Haozhong Wei (2024). nzqa_exam_questions_contextual_population_parameter_definitions - updated [Dataset]. http://doi.org/10.17608/k6.auckland.27644403.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.17608/k6.auckland.27644403.v1
Dataset updated
Nov 11, 2024
Dataset provided by
The University of Auckland
Authors
Anna Fergusson; Haozhong Wei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set represents contextualised population parameter definitions extracted and developed from past NZQA Level 3 Statistics exam questions. and assessment schedules, namely those used for the achievement standards AS90642 and AS91584.The data set was developed by Haozhong Wei as part of his MSc dissertation project, under the supervision of Dr Anna Fergusson and Dr Anne Patel (University of Auckland | Waipapa Taumata Rau).An overview of the variables used in the dataset:1. Year: This variable is the year of the exam.2. Paper: This is the identifier of the paper, e.g., AS90642, indicating the specific exam to which the question belongs.3. Type: This variable indicates the type of data and usually identifies whether the entry is a question or an answer.4. Question part: This variable indicates the specific part number of the problem, e.g., 1a, 1b, 2, etc.5. Text: This is the full text of the question.6. Population parameter: A description of the parameter of the entire text.7. Parameter type: These variables further detail the type of overall parameter, such as ‘single mean’ or ‘single proportion’ or even ‘difference between two means’.
P
DBLP-QuAD Dataset
paperswithcode.com
data.niaid.nih.gov
Updated Jan 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Debayan Banerjee; Sushil Awale; Ricardo Usbeck; Chris Biemann (2019). DBLP-QuAD Dataset [Dataset]. https://paperswithcode.com/dataset/dblp-quad
Explore at:
Dataset updated
Jan 24, 2019
Authors
Debayan Banerjee; Sushil Awale; Ricardo Usbeck; Chris Biemann
Description
In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG). DBLP is an on-line reference for bibliographic information on major computer science publications that indexes over 4.4 million publications, published by more than 2.2 million authors. Our dataset consists of 10,000 question answer pairs with the corresponding SPARQL queries which can be executed over the DBLP KG to fetch the correct answer. To the best of our knowledge, this is the first QA dataset for scholarly KGs.
P
QASports Dataset
paperswithcode.com
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). QASports Dataset [Dataset]. https://paperswithcode.com/dataset/qasports
Explore at:
Dataset updated
Sep 29, 2023
Description
Sport is one of the most popular and revenue-generating forms of entertainment. Therefore, analyzing data related to this domain introduces several opportunities for Question Answering (QA) systems, such as supporting tactical decision-making. But, to develop and evaluate QA systems, researchers and developers need datasets that contain questions and their corresponding answers. In this paper, we focus on this issue. We propose QASports, the first large sports question answering dataset for extractive answer questions. QASports contains more than 1.5 million triples of questions, answers, and context about three popular sports: soccer, American football, and basketball. We describe the QASports processes of data collection and questions and answers generation. We also describe the characteristics of the QASports data. Furthermore, we analyze the sources used to obtain raw data and investigate the usability of QASports by issuing "wh-queries". Moreover, we describe scenarios for using QASports, highlighting its importance for training and evaluating QA systems.
P
SQuAD Dataset
paperswithcode.com
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
François Bienvenu; Mike Steel (2022). SQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/squad
Explore at:
Dataset updated
Oct 5, 2022
Authors
François Bienvenu; Mike Steel
Description
The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.
e
Biomedical Engineering
paper.erudition.co.in
html
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2020). Biomedical Engineering [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper
Explore at:
htmlAvailable download formats
Dataset updated
Jun 9, 2020
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Get Exam Question Paper Solutions of Biomedical Engineering and many more.
P
KorQuAD Dataset
paperswithcode.com
Updated Feb 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seungyoung Lim; Myungji Kim; Jooyoul Lee (2021). KorQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/korquad
Explore at:
Dataset updated
Feb 17, 2021
Authors
Seungyoung Lim; Myungji Kim; Jooyoul Lee
Description
KorQuAD is a large-scale question-and-answer dataset constructed for Korean machine reading comprehension, and investigate the dataset to understand the distribution of answers and the types of reasoning required to answer the question. This dataset benchmarks the data generating process of SQuAD to meet the standard.
P
CANARD Dataset
paperswithcode.com
Updated Oct 4, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Elgohary; Denis Peskov; Jordan Boyd-Graber (2019). CANARD Dataset [Dataset]. https://paperswithcode.com/dataset/canard
Explore at:
Dataset updated
Oct 4, 2019
Authors
Ahmed Elgohary; Denis Peskov; Jordan Boyd-Graber
Description
CANARD is a dataset for question-in-context rewriting that consists of questions each given in a dialog context together with a context-independent rewriting of the question. The context of each question is the dialog utterences that precede the question. CANARD can be used to evaluate question rewriting models that handle important linguistic phenomena such as coreference and ellipsis resolution.

CANARD is based on QuAC (Choi et al., 2018)---a conversational reading comprehension dataset in which answers are selected spans from a given section in a Wikipedia article. Some questions in QuAC are unanswerable with their given sections. We use the answer 'I don't know.' for such questions.

CANARD is constructed by crowdsourcing question rewritings using Amazon Mechanical Turk. We apply several automatic and manual quality controls to ensure the quality of the data collection process. The dataset consists of 40,527 questions with different context lengths. More details are available in our EMNLP 2019 paper. An example is provided below. The dataset is distributed under the CC BY-SA 4.0 license.

Facebook

Twitter

Click to copy link

Link copied

Cite

Einetic (2020). Statistics [Dataset]. https://paper.erudition.co.in/competitive-exams/gate/question-paper

Statistics

ST

Explore at:

htmlAvailable download formats

Dataset updated

Jun 9, 2020

Dataset authored and provided by

Einetic

License

https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

Description

Get Exam Question Paper Solutions of Statistics and many more.

Clear search

Close search

Google apps

Main menu

Statistics

Data from: SPIQA: A Dataset for Multimodal Question Answering on Scientific...

Statistics (ST), Question Paper, Graduate Aptitude Test in Engineering,...

2021

Aerospace Engineering

Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina...

Abstract

Geographic coverage

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Data from: Question Answering (QA) for bridge design

Data from: Yahoo! Answers Dataset

Natural Questions Dataset

Chemical Engineering

QASPER: NLP Questions and Evidence

License

MKQA Dataset

Clotho-AQA dataset

nzqa_exam_questions_contextual_population_parameter_definitions - updated

DBLP-QuAD Dataset

QASports Dataset

SQuAD Dataset

Biomedical Engineering

KorQuAD Dataset

CANARD Dataset

Statistics

ST