55 datasets found
  1. d

    Dataset for: Same Question, Different Answers? An Empirical Comparison of...

    • demo-b2find.dkrz.de
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee
    Explore at:
    Dataset updated
    Sep 22, 2025
    Description

    Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

  2. U

    Data from: Dataset of the study: "Chatbots put to the test in math and logic...

    • researchdata.bath.ac.uk
    Updated May 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios (2023). Dataset of the study: "Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard" [Dataset]. http://doi.org/10.5281/zenodo.7940781
    Explore at:
    Dataset updated
    May 20, 2023
    Dataset provided by
    Zenodo
    Authors
    Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios
    Dataset funded by
    Oslo Metropolitan University
    Description

    This dataset contains the 30 questions that were posed to the chatbots (i) ChatGPT-3.5; (ii) ChatGPT-4; and (iii) Google Bard, in May 2023 for the study “Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard”. These 30 questions describe mathematics and logic problems that have a unique correct answer. The questions are fully described with plain text only, without the need for any images or special formatting. The questions are divided into two sets of 15 questions each (Set A and Set B). The questions of Set A are 15 “Original” problems that cannot be found online, at least in their exact wording, while Set B contains 15 “Published” problems that one can find online by searching on the internet, usually with their solution. Each question is posed three times to each chatbot.

    This dataset contains the following: (i) The full set of the 30 questions, A01-A15 and B01-B15; (ii) the correct answer for each one of them; (iii) an explanation of the solution, for the problems where such an explanation is needed, (iv) the 30 (questions) × 3 (chatbots) × 3 (answers) = 270 detailed answers of the chatbots. For the published problems of Set B, we also provide a reference to the source where each problem was taken from.

  3. QADO: An RDF Representation of Question Answering Datasets and their...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov (2023). QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.21750029.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.

    Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.

    Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.

  4. Stack Overflow Data Survey

    • kaggle.com
    zip
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lgib88 (2024). Stack Overflow Data Survey [Dataset]. https://www.kaggle.com/datasets/lgib88/stack-overflow-data-survey/discussion
    Explore at:
    zip(76250931 bytes)Available download formats
    Dataset updated
    May 28, 2024
    Authors
    lgib88
    Description

    Conducted an in-depth analysis comparing:

    Trends Derived from Tags: Extracted and analyzed tags from the Stack Exchange API to identify programming language trends.

    Annual User Survey Data: Examined data from Stack Overflow's annual user survey to understand user preferences and technology adoption.

    By comparing these two data sources, I validated trends and patterns, offering a comprehensive understanding of the current programming language and technology landscape.

  5. Home Health and Hospice Compare Data Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Home Health and Hospice Compare Data Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/home-health-and-hospice-compare-data-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package includes information about List and Rating of Home Health Care Agencies, Health Care for Patient Survey data and State data of several home health agency quality measures as well as State Averages for Home Health Agency (HHA) Quality Measures. It also provides datasets over Hospice General Information, Provider data and CASPER or ASPEN Information about hospice agencies.

  6. Comparisons of WebGPT and OpenAI Models

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Comparisons of WebGPT and OpenAI Models [Dataset]. https://www.kaggle.com/datasets/thedevastator/comparisons-of-webgpt-and-openai-models/data
    Explore at:
    zip(89181579 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Comparisons of WebGPT and OpenAI Models

    A comparison between WebGPT and OpenAI models with metrics and answers provided

    By openai (From Huggingface) [source]

    About this dataset

    This dataset contains comparisons between WebGPT models and OpenAI models, along with various metrics used to evaluate their performance. The dataset includes several columns such as 'question' which represents the question asked in the comparison, 'quotes_0' and 'quotes_1' which correspond to the quotes or statements from WebGPT model and OpenAI model respectively. The answers provided by both models are recorded in the columns 'answer_0' and 'answer_1'. Additionally, there are columns indicating the number of tokens used by each model ('tokens_0' and 'tokens_1'), as well as the score or confidence level of their respective answers ('score_0' and 'score_1').

    The purpose of this dataset is to provide training data for comparing different versions of WebGPT models with OpenAI models. By capturing various aspects such as question formulation, generated answers, token usage, and confidence scores, this dataset aims to enable a comprehensive analysis of the performance and capabilities of these models.

    Overall, this dataset offers researchers an opportunity to explore the similarities and differences between WebGPT models and OpenAI models based on real-world comparisons. It can serve as a valuable resource for training machine learning algorithms, conducting comparative analyses, understanding model behavior, or developing new techniques in natural language processing

    How to use the dataset

    Overview

    The dataset consists of several columns that contain valuable information for each comparison. Here is an overview of the columns present in this dataset:

    • question: The question asked in the comparison.
    • quotes_0: The quotes or statements from the WebGPT model.
    • answer_0: The answer provided by the WebGPT model.
    • tokens_0: The number of tokens used by the WebGPT model to generate the answer.
    • score_0: The score or confidence level of the answer provided by the WebGPT model.
    • quotes_1: The quotes or statements from the OpenAI model.
    • answer_1: The answer provided by OpenAI model.
    • tokens_1: The number of tokens used by OpenAI model to generate the answer.
    • score_1 :The score or confidence level of the answer provided by OpenAI model.

    Dataset Usage

    This dataset can be utilized in various ways for research, analysis, and improvement-related purposes related to comparing performance between different models.

    Here are a few examples:

    1) Model Comparison:

    You can compare and analyze how well both models (WebGTP and OpenAI) perform on specific questions based on their answers, scores/confidence levels, token usage, and supporting quotes/statements.

    2) Metric Evaluation:

    By examining both scores/confidence levels (score_0 & score_1), you can evaluate which model tends to provide more reliable answers overall.

    3) Token Efficiency:

    By analyzing tokens usage (tokens_0 & tokens_1), you can gain insights into which model is more efficient in generating answers within token limits.

    4) Model Improvements:

    The dataset can be used to identify areas of improvement for both the WebGPT and OpenAI models. By analyzing the answers, quotes, and scores, you may discover patterns or common pitfalls that can guide future model enhancements.

    Conclusion

    This dataset provides a valuable resource for comparing WebGPT and OpenAI models. With the information provided in each column, researchers can perform a wide range of analysis to better understand the strengths and weaknesses of each model. Whether it's

    Research Ideas

    • Model Evaluation: This dataset can be used to compare the performance of different models, specifically WebGPT models and OpenAI models. The scores, quotes, answers, and token counts provided by each model can be analyzed to determine which model performs better for a given task.
    • Feature Engineering: The dataset can be used to extract relevant features that indicate the quality or accuracy of an answer generated by a model. These features can then be used in building machine learning models to improve the performance of question answering systems.
    • Bias Analysis: By analyzing the quotes and answers provided by WebGPT and OpenAI models, this dataset can help identify any biases or patterns in their responses. This analysis can provide insights into potential biases present in AI-generated content and inform efforts towards making AI systems more fair and unbiased

    Acknowledgemen...

  7. d

    Replication Data for: Typing or Speaking? Comparing text and voice answers...

    • dataone.org
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Höhne, Jan Karem; Gavras, Konstantin; Claassen, Joshua (2024). Replication Data for: Typing or Speaking? Comparing text and voice answers to open questions on sensitive topics in smartphone surveys [Dataset]. http://doi.org/10.7910/DVN/3KCPNK
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Höhne, Jan Karem; Gavras, Konstantin; Claassen, Joshua
    Description

    The dataset allows to replicate the results of the following article: Höhne, J. K., Gavras, K., & Claassen, J. (2024; accepted). Typing or Speaking? Comparing text and voice answers to open questions on sensitive topics in smartphone surveys. Social Science Computer Review.

  8. w

    Bright II 2012-2013 - Burkina Faso

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Mar 27, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathematica Policy Research (2019). Bright II 2012-2013 - Burkina Faso [Dataset]. https://microdata.worldbank.org/index.php/catalog/3430
    Explore at:
    Dataset updated
    Mar 27, 2019
    Dataset authored and provided by
    Mathematica Policy Research
    Time period covered
    2012 - 2013
    Area covered
    Burkina Faso
    Description

    Abstract

    Millennium Challenge Corporation hired Mathematica Policy Research to conduct an independent evaluation of the BRIGHT II program. The three main research questions of interest are: • What was the impact of the program on school enrollment, attendance, and retention? • What was the impact of the program on test scores? • Are the impacts different for girls than for boys?

    Mathematica will compare data collected from the 132 communities served by BRIGHT II (the "treatment group") with that collected from the 161 communities that applied but were not selected for the program (the "comparison group"). Using a statistical technique called regression discontinuity, Mathematica will compare the outcomes of the treatment villages just above the cutoff point to the outcomes of the comparison villages just below the cutoff point. If the intervention had an impact, we will observe a "jump" in outcomes at the point of discontinuity.

    Mathematica will perform additional analyses to estimate the overall merit of the BRIGHT investment. By conducting a cost-benefit analysis and a cost-effectiveness analysis and calculating the economic rate of return, Mathematica will be able to answer questions related to the sustainability of the program, and compare the program to interventions and social investments in other sectors. The household survey is designed to capture household-level data rather than community-level data; however, questions have been included to measure head-of-household expectations of educational attainment. These questions ask the head of household what grade level he hopes each child will attain; and what grade level he thinks the child will be capable of achieving in reality.

    Geographic coverage

    132 rural villages throughout the 10 provinces of Burkina Faso in which girls' enrollment rates were lowest

    Analysis unit

    Households

    Universe

    Households, students, and educators in the 287 villages surveyed

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The BRIGHT II program was implemented in the same 132 villages that received the BRIGHT I interventions. These 132 villages were originally selected using a scoring process, with eligibility scores based on the villages’ potential to improve girls’ educational outcomes. A total of 293 villages applied to receive a BRIGHT school; the Burkina Faso Ministry of Basic Education (MEBA) selected the 132 villages with scores that were above a certain cutoff point. Whenever possible, the survey will be conducted with the same children in the same households and schools surveyed during the BRIGHT I evaluation. By visiting the same households and schools, the evaluator will be able to better assess the longer-term impacts of the BRIGHT project.

    Research instrument

    Mathematica has developed two surveys, a household survey and a school survey, to collect relevant data from villages in both the treatment and comparison groups. The household survey was administered to a new cross-section of households compared to the BRIGHT I evaluation. Data will be collected on the attendance and educational attainment of school-age children in the household, attitudes towards girls' education, and parental assessment of the extent to which the complementary interventions influenced school enrollment decisions. It will also assess the performance of all household children on basic tests of French and math. The school survey, to be administered to all local schools in the 293 villages, gathers data on school characteristics, personnel, and physical structure, and collects enrollment and attendance records. Data will be gathered by a local data collection firm selected by MCA-Burkina Faso, with Mathematica providing technical assistance and oversight.

    Cleaning operations

    Following data collection, Mathematica will work with BERD to ensure that the data are correctly entered and are complete and clean. This will include a review of all frequencies for out-of-range responses, missing data, or other problems, as well as a comparison between the data and paper copies for a random selection of variables.

  9. n

    Data from: A user-friendly guide to using distance measures to compare time...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shawn Dove; Monika Böhm; Robin Freeman; Sean Jellesmark; David Murrell (2024). A user-friendly guide to using distance measures to compare time series in ecology [Dataset]. http://doi.org/10.5061/dryad.bzkh189g7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    University College London
    Zoological Society of London
    Indianapolis Zoo
    Authors
    Shawn Dove; Monika Böhm; Robin Freeman; Sean Jellesmark; David Murrell
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Time series are a critical component of ecological analysis, used to track changes in biotic and abiotic variables. Information can be extracted from the properties of time series for tasks such as classification (e.g. assigning species to individual bird calls); clustering (e.g. clustering similar responses in population dynamics to abrupt changes in the environment or management interventions); prediction (e.g. accuracy of model predictions to original time series data); and anomaly detection (e.g. detecting possible catastrophic events from population time series). These common tasks in ecological research rely on the notion of (dis-) similarity, which can be determined using distance measures. A plethora of distance measures have been described, predominantly in the computer and information sciences, but many have not been introduced to ecologists. Furthermore, little is known about how to select appropriate distance measures for time-series-related tasks. Therefore, many potential applications remain unexplored. Here we describe 16 properties of distance measures that are likely to be of importance to a variety of ecological questions involving time series. We then test 42 distance measures for each property and use the results to develop an objective method to select appropriate distance measures for any task and ecological dataset. We demonstrate our selection method by applying it to a set of real-world data on breeding bird populations in the UK and discuss other potential applications for distance measures, along with associated technical issues common in ecology. Our real-world population trends exhibit a common challenge for time series comparisons: a high level of stochasticity. We demonstrate two different ways of overcoming this challenge, first by selecting distance measures with properties that make them well-suited to comparing noisy time series, and second by applying a smoothing algorithm before selecting appropriate distance measures. In both cases, the distance measures chosen through our selection method are not only fit-for-purpose but are consistent in their rankings of the population trends. The results of our study should lead to an improved understanding of, and greater scope for, the use of distance measures for comparing ecological time series, and help us answer new ecological questions. Methods Distance measure test results were produced using R and can be replicated using scripts available on GitHub at https://github.com/shawndove/Trend_compare. Detailed information on wading bird trends can be found in Jellesmark et al. (2021) below. Jellesmark, S., Ausden, M., Blackburn, T. M., Gregory, R. D., Hoffmann, M., Massimino, D., McRae, L., & Visconti, P. (2021). A counterfactual approach to measure the impact of wet grassland conservation on U.K. breeding bird populations. Conservation Biology, 35(5), 1575–1585. https://doi.org/10.1111/cobi.13692

  10. Question-based Assessment: Human vs. ChatGPT

    • kaggle.com
    zip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujtaba Mateen (2023). Question-based Assessment: Human vs. ChatGPT [Dataset]. https://www.kaggle.com/datasets/mujtabamatin/question-based-assessment-human-vs-chatgpt
    Explore at:
    zip(9873 bytes)Available download formats
    Dataset updated
    Aug 28, 2023
    Authors
    Mujtaba Mateen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In this dataset, a variety of questions spanning different subjects and mediums are presented, and a comparison is made between the actual marks obtained by human respondents and the marks gained by the ChatGPT model. The dataset encompasses questions related to logical equivalences, programming concepts, and applications of various logical laws.

    Each entry in the dataset includes the following information: - Questions: The text of the questions asked. - Subject: The subject of the question (e.g., Data Structures). - Medium: The type of assessment (e.g., Exam, Quiz, Assignment). - Max Marks: The maximum possible marks for the question. - Marks Obtained: The actual marks obtained by human respondents. - Marks Obtained ChatGPT: The marks gained by the ChatGPT model.

    The dataset aims to provide insights into the performance of both human respondents and the ChatGPT model across different question types and assessment scenarios. It serves as a resource for evaluating the effectiveness of the model in predicting human-level performance on various question-based assessments, helping to understand the alignment between human reasoning and the model's responses.

  11. h

    csv_vs_viz

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Mordatch, csv_vs_viz [Dataset]. https://huggingface.co/datasets/imordatch/csv_vs_viz
    Explore at:
    Authors
    Igor Mordatch
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CSVvsVizQA

    Dataset to compare question answering ability from CSV Data vs Data Visualization images.

  12. ELI5 Scorer Train Data Prototype 816,000 Examples

    • kaggle.com
    zip
    Updated Aug 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neuron Engineer (2020). ELI5 Scorer Train Data Prototype 816,000 Examples [Dataset]. https://www.kaggle.com/datasets/ratthachat/eli5-scorer-train-data-prototype-272x3
    Explore at:
    zip(248994043 bytes)Available download formats
    Dataset updated
    Aug 18, 2020
    Authors
    Neuron Engineer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Original ELI5 vs. this Scorer ELI5 datasets

    ELI5 means "Explain like I am 5" . It's originally a "long and free form" Question-Answering scraping from reddit eli5 subforum. Original ELI5 datasets (https://github.com/facebookresearch/ELI5) can be used to train a model for "long & free" form Question-Answering , e.g. by Encoder-Decoder models like T5 or Bart

    Conventional performance evaluation : ROUGE scores

    When we get a model, how can we estimate model performance (ability to give high-quality answers) ? Conventional methods are ROUGE-family metrics (see ELI5 paper linked above)

    However, ROUGE scores are based on n-gram and and need to compare a generated answer to a ground-truth answer. Unfortunately, n-gram scoring cannot evaluate high-quality paraphrase answers.

    Worse, the need to a ground-truth answer in order to compare and calculate (ROUGE) score. This scoring perspective is against the "spirit" of the "free form" question answering where there are many possible (non-paraphrase) valid and good answers .

    To summarize, "creative & high-quality" answers cannot be estimated with ROUGE , which prevents us to construct (and estimate) creative models.

    This dataset : to create a better scorer

    This dataset, in contrast, is aimed for training a "scoring" (regression) model , which can predict an upvote score on each Q-A pair individually (not A-A pair like ROUGE) .

    The data is simply a CSV file containing Q-A pairs and their scores. Each line contains Q-A texts (in Roberta format) and its upvote score (non-negative integer)

    It is intended to be easy and direct to create scoring model with Roberta (or other Transformer models with changing separation token) .

    CSV file

    In the csv file, there is qa column and answer_score column Each row in qa is written in Roberta paired-sentences format -- Answer

    With answer_score we have the following principle : - High quality answer related to its question should get high score (upvotes) - Low quality answer related to its question should get low score - Well written answer NOT related to its question should get 0 score

    Each positive Q-A pair comes from the original ELI5 dataset (true upvote score). Each 0-score Q-A pair is constructed with details in the next subsection.

    0-score construction details via RetriBERT & FAISS

    The principle is contrastive training. We need somewhat high-quality 0-score pairs for model to generalize. Too easy 0-score pairs (e.g. a question with random answers will be too easy and a model will learn nothing)

    Therefore, for each question, we try to construct two answers (two 0-score pairs) where each answer is related to the topic of the question, but does not answer the question.

    This can be achieve by vectorizing all questions into vectors using RetriBERT and storing with FAISS. We can then measure a distance between two question vectors using cosine distance.

    More precisely, for a question Q1, we choose two answers of related (but non-identical) questions Q2 and Q3 , i.e. answer A2 and A3, to construct Q1-A2 and Q1-A3 pairs of 0-score. Combining with the Q1-A1 pair of positive score, we will have 3 Q1 pairs , and 3 pairs for each questions in total. Therefore, from 272,000 examples of original ELI5 , in this dataset we have 3 times of its size = 816,000 examples .

    Note that two question vectors that are very close can be the same (paraphrase) question , and two questions that are very far apart are totally different questions. Therefore, we need a threshold to determine not-too-close & not-too-far pair of questions so that we get non-identical but same-topic question pairs. In a simple experiment, a cosine distance of 10-11 of RetriBERT vectors seem work well, so we use this number as a threshold to construct a 0-score Q-A pair.

    Baseline Model

    roberta-base baseline with MAE 3.91 on validation set can be found here : https://www.kaggle.com/ratthachat/eli5-scorer-roberta-base-500k-mae391

    Acknowledgements

    Facebook AI team for creating original ELI5 dataset, and Huggingface NLP library for make us access this dataset easily . - https://github.com/facebookresearch/ELI5 - https://huggingface.co/nlp/viewer/

    Inspiration

    My project on ELI5 is mainly inspired from this amazing work of Yacine Jernite : https://yjernite.github.io/lfqa.html

  13. u

    Amazon Question and Answer Data

    • cseweb.ucsd.edu
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon Question and Answer Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain 1.48 million question and answer pairs about products from Amazon.

    Metadata includes

    • question and answer text

    • is the question binary (yes/no), and if so does it have a yes/no answer?

    • timestamps

    • product ID (to reference the review dataset)

    Basic Statistics:

    • Questions: 1.48 million

    • Answers: 4,019,744

    • Labeled yes/no questions: 309,419

    • Number of unique products with questions: 191,185

  14. The comparison between MetaQA and recent table question answering tables.

    • plos.figshare.com
    xls
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diya Li; Zhe Zhang (2023). The comparison between MetaQA and recent table question answering tables. [Dataset]. http://doi.org/10.1371/journal.pone.0293034.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Diya Li; Zhe Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The comparison between MetaQA and recent table question answering tables.

  15. d

    MAR Frequently Asked Questions

    • catalog.data.gov
    • opendata.dc.gov
    • +1more
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2025). MAR Frequently Asked Questions [Dataset]. https://catalog.data.gov/dataset/mar-frequently-asked-questions
    Explore at:
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    City of Washington, DC
    Description

    The MAR allows the District Government to more easily compare information across databases and agencies. Learn more about the MAR with these frequently asked questions.

  16. Z

    Webis Comparative Questions 2022

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Oct 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Bondarenko; Yamen Ajjour; Valentin Dittmar; Niklas Homann; Pavel Braslavski; Matthias Hagen (2022). Webis Comparative Questions 2022 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7213396
    Explore at:
    Dataset updated
    Oct 17, 2022
    Dataset provided by
    Ural Federal University
    Martin-Luther-Universität Halle-Wittenberg
    Authors
    Alexander Bondarenko; Yamen Ajjour; Valentin Dittmar; Niklas Homann; Pavel Braslavski; Matthias Hagen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Webis Comparative Questions 2022 dataset contains about 31,000 questions labeled as comparative or not.

    3,500 comparative questions are labeled on the token level with comparison objects, aspects, predicates, or none.

    For 950 questions, text passages that potentially answer the questions are labeled with the stance: pro first comparison object, pro second, neutral, or no stance.

  17. Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the...

    • plos.figshare.com
    xls
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mete Kara; Erkan Ozduran; Müge Mercan Kara; İlhan Celil Özbek; Volkan Hancı (2025). Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the most frequently asked Ankylosing spondylitis -related questions, and a statistical comparison of the text content to a 6th-grade reading level [Median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)]. [Dataset]. http://doi.org/10.1371/journal.pone.0326351.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mete Kara; Erkan Ozduran; Müge Mercan Kara; İlhan Celil Özbek; Volkan Hancı
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the most frequently asked Ankylosing spondylitis -related questions, and a statistical comparison of the text content to a 6th-grade reading level [Median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)].

  18. Citizen Science sensor measurements to support frequently asked questions...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Citizen Science sensor measurements to support frequently asked questions (FAQ) [Dataset]. https://catalog.data.gov/dataset/citizen-science-sensor-measurements-to-support-frequently-asked-questions-faq
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This file has two sheets. Data are measurements by Citizen Science Air Monitors (CSAM) and Federal Monitors, which sampled particulate matter (PM), nitrogen dioxide (NO2), relative humidity (RH), and temperature (T). Variables for each sheet are described in more detail below The sheet “Snorkel No-Snorkel Comparison” includes data from two CSAM units, CSAM-2 and CSAM-3. CSAM-2 used a snorkel tube to sample outdoor air, and CSAM-3 did not use a snorkel tube. CSAM-2 and CSAM-3 were not in the same sampling location, but did sample contemporaneous measurements. These data were used to perform a snorkel and no-snorkel comparison. The sheet “CSAM-1 and Federal Monitor” includes data from a CSAM unit (CSAM-1) and a Federal Monitor (which is used for regulatory measurements of air pollution). CSAM-1 and the Federal Monitor were installed in the same sampling location and recorded contemporaneous measurements. For CSAM-1, original recorded measurements are included, as well as measurements that were corrected (using regression equations) to better reflect the Federal Monitor values. This dataset is associated with the following publication: Barzyk, T., H. Huang, R. Williams, A. Kaufman, and J. Essoka. Advice and Frequently Asked Questions (FAQs) for Citizen-Science Environmental Health Assessments. International Journal of Environmental Research and Public Health. Molecular Diversity Preservation International, Basel, SWITZERLAND, 15(5): 960, (2018).

  19. r

    Learning and fatigue during choice experiments: a comparison of online and...

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott J. Savage (2025). Learning and fatigue during choice experiments: a comparison of online and mail survey modes (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9sZWFybmluZy1hbmQtZmF0aWd1ZS1kdXJpbmctY2hvaWNlLWV4cGVyaW1lbnRzLWEtY29tcGFyaXNvbi1vZi1vbmxpbmUtYW5kLW1haWwtc3VydmV5LW1vZGVz
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Journal of Applied Econometrics
    ZBW
    ZBW Journal Data Archive
    Authors
    Scott J. Savage
    Description

    This study investigates the effect of survey mode on respondent learning and fatigue during repeated choice experiments. Stated preference data are obtained from an experiment concerning high-speed Internet service conducted on samples of mail respondents and online respondents. We identify and estimate aspects of the error components for different subsets of the choice questions, for both mail and online respondents. Results show mail respondents answer questions consistently throughout a series of choice experiments, but the quality of the online respondents' answers declines. Therefore, while the online survey provides lower survey administration costs and reduced time between implementation and data analysis, such benefits come at the cost of less precise responses.

  20. Data from: Line Police Officer Knowledge of Search and Seizure Law: An...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Line Police Officer Knowledge of Search and Seizure Law: An Exploratory Multi-city Test in the United States, 1986-1987 [Dataset]. https://catalog.data.gov/dataset/line-police-officer-knowledge-of-search-and-seizure-law-an-exploratory-multi-city-tes-1986-7efc4
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justicehttp://nij.ojp.gov/
    Area covered
    United States
    Description

    This data collection was undertaken to gather information on the extent of police officers' knowledge of search and seizure law, an issue with important consequences for law enforcement. A specially-produced videotape depicting line duty situations that uniformed police officers frequently encounter was viewed by 478 line uniformed police officers from 52 randomly-selected cities in which search and seizure laws were determined to be no more restrictive than applicable United States Supreme Court decisions. Testing of the police officers occurred in all regions as established by the Federal Bureau of Investigation, except for the Pacific region (California, Oregon, and Washington), since search and seizure laws in these states are, in some instances, more restrictive than United States Supreme Court decisions. No testing occurred in cities with populations under 10,000 because of budget limitations. Fourteen questions to which the officers responded were presented in the videotape. Each police officer also completed a questionnaire that included questions on demographics, training, and work experience, covering their age, sex, race, shift worked, years of police experience, education, training on search and seizure law, effectiveness of various types of training instructors and methods, how easily they could obtain advice about search and seizure questions they encountered, and court outcomes of search and seizure cases in which they were involved. Police department representatives completed a separate questionnaire providing department characteristics and information on search and seizure training and procedures, such as the number of sworn officers, existence of general training and the number of hours required, existence of in-service search and seizure training and the number of hours and testing required, existence of policies and procedures on search and seizure, and means of advice available to officers about search and seizure questions. These data comprise Part 1. For purposes of comparison and interpretation of the police officer test scores, question responses were also obtained from other sources. Part 2 contains responses from 36 judges from states with search and seizure laws no more restrictive than the United States Supreme Court decisions, as well as responses from a demographic and work-experience questionnaire inquiring about their age, law school attendance, general judicial experience, and judicial experience and education specific to search and seizure laws. All geographic regions except New England and the Pacific were represented by the judges. Part 3, Comparison Data, contains answers to the 14 test questions only, from 15 elected district attorneys, 6 assistant district attorneys, the district attorney in another city and 11 of his assistant district attorneys, a police attorney with expertise in search and seizure law, 24 police academy trainees with no previous police work experience who were tested before search and seizure law training, a second group of 17 police academy trainees -- some with police work experience but no search and seizure law training, 55 law enforcement officer trainees from a third academy tested immediately after search and seizure training, 7 technical college students with no previous education or training on search and seizure law, and 27 university criminal justice course students, also with no search and seizure law education or training.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee

Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND

Explore at:
Dataset updated
Sep 22, 2025
Description

Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

Search
Clear search
Close search
Google apps
Main menu