100+ datasets found
  1. h

    natural_reasoning

    • huggingface.co
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2025). natural_reasoning [Dataset]. https://huggingface.co/datasets/facebook/natural_reasoning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Dataset authored and provided by
    AI at Meta
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.

  2. h

    natural-science-reasoning

    • huggingface.co
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2025). natural-science-reasoning [Dataset]. https://huggingface.co/datasets/dvilasuero/natural-science-reasoning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2025
    Authors
    Daniel Vila
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Natural Sciences Reasoning: the "smolest" reasoning dataset

    A smol-scale open dataset for reasoning tasks using Hugging Face Inference Endpoints. While intentionally limited in scale, this resource prioritizes:

    Reproducible pipeline for reasoning tasks using a variety of models (Deepseek V3, Deepsek-R1, Llama70B-Instruct, etc.)

    Knowledge sharing for domains other than Math and Code reasoning

    In this repo, you can find:

    The prompts and the pipeline (see the config file). The… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/natural-science-reasoning.

  3. h

    facebook-natural-reasoning-curriculum

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Kyle Stone (2025). facebook-natural-reasoning-curriculum [Dataset]. https://huggingface.co/datasets/essobi/facebook-natural-reasoning-curriculum
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    The Kyle Stone
    Description

    Facebook Natural Reasoning - Curriculum Learning

    This dataset is a curriculum learning version of the Facebook Natural Reasoning dataset, automatically split into four complexity levels:

      Splits
    

    simple_reasoning: 28898 examples - Basic arithmetic, one-step problems basic_reasoning: 713776 examples - Chained operations, single-variable algebra intermediate_reasoning: 397706 examples - Proportional reasoning, multi-step problems
    complex_reasoning: 5444 examples -… See the full description on the dataset page: https://huggingface.co/datasets/essobi/facebook-natural-reasoning-curriculum.

  4. P

    DR.BENCH Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanjun Gao; Dmitriy Dligach; Timothy Miller; John Caskey; Brihat Sharma; Matthew M Churpek; Majid Afshar, DR.BENCH Dataset [Dataset]. https://paperswithcode.com/dataset/dr-bench
    Explore at:
    Authors
    Yanjun Gao; Dmitriy Dligach; Timothy Miller; John Caskey; Brihat Sharma; Matthew M Churpek; Majid Afshar
    Description

    DR.BENCH is a dataset for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation.

  5. h

    Natural-Reasoning-STEM-25K

    • huggingface.co
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qingyang zhang (2025). Natural-Reasoning-STEM-25K [Dataset]. https://huggingface.co/datasets/qingyangzhang/Natural-Reasoning-STEM-25K
    Explore at:
    Dataset updated
    Jul 9, 2025
    Authors
    qingyang zhang
    Description

    qingyangzhang/Natural-Reasoning-STEM-25K dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    Source Code, Data and Additional Material for the Thesis: "Social...

    • heidata.uni-heidelberg.de
    bin, csv, json, pdf +6
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debjit Paul; Debjit Paul (2023). Source Code, Data and Additional Material for the Thesis: "Social Commonsense Reasoning with Structured Knowledge in Text" [Dataset]. http://doi.org/10.11588/DATA/C56QUV
    Explore at:
    text/x-python(2875), bin(1401), text/x-python(11363), text/x-python(1641), bin(3307), bin(3069), bin(9027), txt(15454887), text/markdown(2055), text/x-python(8994), text/x-python(383), text/x-python(526), text/x-python(2490), txt(5636731), text/markdown(1640), text/x-python(41174), text/x-python(5424), text/x-python(4495), pdf(382267), bin(3327), text/x-python(2546), text/x-python(77394), sh(279), text/x-python(8313), text/x-python(43556), text/x-python(31067), text/x-python(30089), text/markdown(208), bin(1854), text/x-python(12556), bin(864812), text/x-python(2602), txt(1093121), text/markdown(1448), text/x-python(37853), bin(1110), bin(7727), text/x-python(2477), bin(2121), text/x-python(1064), text/x-python(35065), bin(2893), text/x-python(16644), sh(420), bin(5812), txt(1068863), text/x-python(4913), text/x-python(14890), text/x-python(45885), bin(7419), text/x-python(9342), bin(2129), bin(6118), json(1442), bin(3206), text/markdown(794), bin(5951), bin(52269), bin(1592), bin(3117), text/x-python(9828), text/x-python(2251), bin(11219), text/x-python(6994), text/markdown(1639), txt(420), text/x-python(9125), text/x-python(2658), text/x-python(13363), text/x-python(3102), text/x-python(11326), text/x-python(22446), text/x-python(3146), bin(1937), text/x-python(1359), text/markdown(1167), bin(1955), png(95078), text/x-python(2584), txt(392), bin(6836), bin(2822), bin(11922), text/x-python(5769), json(235), text/markdown(1), txt(18462763), csv(2395105), text/x-python(34564), text/x-python(7667), bin(1991), text/x-python(12518), bin(3737), bin(2027), text/x-python(31980), text/markdown(538), text/x-python(9714), text/x-python(10440), text/x-python(1907), sh(190), text/x-python(31222), text/x-python(8302), text/x-python(4144), bin(11564), pdf(36295), text/x-python(2957), text/x-python(4298), txt(939), text/x-python(21501), text/x-python(34392), text/x-python(2970), text/x-python(13823), txt(22844886), sh(208), text/x-python(3464), text/x-python(8478), bin(507), text/x-python(82119), text/x-python(2575), text/x-python(4238), text/x-python(6500), pdf(12085664), bin(3140), text/x-python(4876), text/x-python(59562), text/x-python(1578), txt(34348), text/x-python(1871), text/x-python(16490), png(338291), bin(4275), bin(2649), bin(2553), text/x-python(39776), txt(1512), text/x-python(34414), text/x-python(2139), text/x-python(7025), text/x-python(74071), json(991), text/x-python(3006), txt(4865485), txt(227), text/x-python(10384), bin(2578), bin(1902), bin(2997), text/x-python(7085), pdf(461423), text/x-python(23917), text/x-python(37708), csv(2790411), text/x-python(7702), text/x-python(1184), text/x-python(8467), pdf(8367047), text/x-python(24459), text/x-python(3490), json(576), bin(3596), bin(2968), text/x-python(3662), json(234), text/x-python(14229), text/x-python(1530), json(633), text/x-python(4487), bin(3295), txt(1139035), bin(1946), text/x-python(1964), text/x-python(4517), bin(1136), bin(7688), json(1440), text/x-python(34454), text/markdown(1700), text/x-python(1821), text/x-python(6710), pdf(6149907), text/x-python(21261), text/x-python(2659), text/x-python(6801), text/x-python(1729), text/x-python(2157), text/x-python(11202), bin(1342), text/x-python(2523), bin(12504), bin(1643), text/x-python(37626), text/x-python(72409), text/x-python(6499), bin(9191), text/x-python(12280), text/x-python(3685), text/x-python(12844), sh(129), text/x-python(1908), text/x-python(5656), text/x-python(5009), sh(98), text/x-python(31054), text/x-python(40857), text/x-python(17870), bin(10636), bin(25272), text/x-python(12880), text/x-python(1393), bin(1378), sh(1587), bin(9083), text/x-python(7934), text/x-python(38448), bin(3332), bin(6375), text/markdown(1692), text/x-python(399), bin(2827), bin(2651), text/x-python(5756), text/x-python(8658), bin(7463), text/x-python(8013), bin(27888), json(1444), sh(1348), text/x-python(12602), text/x-python(33616), text/x-python(12571), txt(15889964), text/x-python(18762), text/x-python(7999), text/x-python(2202), bin(7417), text/x-python(59493), text/x-python(8101), txt(1009), text/x-python(11431), text/x-python(55244), text/x-python(4115), text/markdown(886), text/x-python(12206), bin(11543), text/x-python(3972), png(405669), txt(23196928), text/markdown(487), pdf(119150), bin(10038), zip(13480477), bin(2998), text/x-python(10326), text/x-python(17712), text/x-python(3747), text/markdown(4708), text/x-python(910), text/x-python(9914), text/x-python(74046), text/markdown(7578), text/x-python(14169), png(194187), text/x-python(10327), text/x-python(2507), bin(12582), text/x-python(38942), bin(9973), bin(1165), json(571), bin(9105), text/x-python(7273), text/x-python(3137), text/x-python(56818), text/x-python(38643), text/x-python(10219), text/x-python(53535), txt(15185565), text/x-python(7918), text/x-python(10979), txt(20964772), text/x-python(2313), text/markdown(184), text/x-python(4669), text/x-python(46665), text/x-python(2185), sh(338), text/x-python(10062), bin(4552), bin(52004), text/x-python(25228), text/x-python(800), text/x-python(12950), bin(10745), bin(3582), json(1679), text/x-python(29838), bin(561), text/x-python(9509), text/x-python(2259), bin(1729), bin(25130), text/x-python(2459), text/x-python(7794), text/markdown(281), bin(10649), bin(27713), text/x-python(45206), text/x-python(13582), sh(433), bin(2758), bin(4103), text/x-python(5878), text/x-python(41332), text/x-python(9464), txt(14595080), text/x-python(4143), bin(3341), text/x-python(10530), text/markdown(388), text/markdown(793), text/x-python(10180), text/x-python(3287), sh(1568), bin(6704), txt(1065385), text/x-python(2831), bin(10047), text/x-python(2100), text/x-python(13157), bin(525), text/x-python(2641), text/x-python(4472), text/x-python(6840), text/x-python(22263)Available download formats
    Dataset updated
    Jul 4, 2023
    Dataset provided by
    heiDATA
    Authors
    Debjit Paul; Debjit Paul
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/C56QUVhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/C56QUV

    Description

    Understanding a social situation requires the ability to reason about the underlying emotions and behaviour of others. For example, when we read a personal story, we use our prior commonsense knowledge and social intelligence to infer the emotions, motives, and anticipate the actions of the characters in a story. For machines to understand text related to \textit{personal stories and social conversations}, they must be able to make commonsense inferences. While most people can reason deeply about the social implications of the text, it is challenging for natural language processing systems as these implications are often subtle and implicit. This dissertation argues that NLP systems must learn to reason more explicitly about the underlying social knowledge in text to perform social commonsense reasoning. We divide the above argument into two sub-problems: (i) understanding the underlying social knowledge and (ii) explicitly reasoning about such knowledge for social commonsense reasoning. To address these problems, we propose building NLP systems that integrate neural network based learning with structured knowledge representations. In the first part of this dissertation, we study the role of structured commonsense knowledge in understanding the social dynamics of characters and their actions in stories. Our motivation behind enriching the model with structured commonsense knowledge is to bridge the gap between surface meanings of texts and the underlying social implication of each event in the stories. We develop a novel model that incorporates commonsense knowledge into neural models and showcases the importance of commonsense knowledge in understanding social dynamics of story characters. Further, we investigate the role of temporal dynamics of story events in understanding social situations. We develop a model that can explicitly learn about \textit{what social event follows another event} from personal narrative stories. We demonstrate that \textit{implicitly} leveraging such temporal knowledge about story events can support social commonsense reasoning tasks. In the second part of this dissertation, we investigate methods to explicitly reason about the knowledge related to social dynamics of characters (behavior, mental states) and cause/effect of social events. We propose a novel model named as \textit{multi-head knowledge attention} that incorporates such social knowledge into state-of-the-art neural NLP models to address two complex commonsense inference tasks. We demonstrate that our method of incorporating knowledge can improve -- (i) the robustness and the interpretability of the model and (ii) the overall performance of the model compared to other knowledge integration methods. We also aim to investigate social commonsense reasoning as a natural language generation task. We design a story completion task that requires natural language generation models to perform both forward and backward reasoning. We study the role of contextualized commonsense knowledge in natural language generation tasks. We propose a model that jointly learns to generate contextualized inference rules as well as narrative stories. We demonstrate that our model can outperform state-of-the-art non-contextualized commonsense knowledge-based generation models. We hope that the research presented in this dissertation will open up interesting scopes for future research involving social commonsense reasoning and other related topics.

  7. Z

    lilGym: Natural Language Visual Reasoning with Reinforcement Learning, model...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kojima, Noriyuki (2023). lilGym: Natural Language Visual Reasoning with Reinforcement Learning, model files [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_8128779
    Explore at:
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Artzi, Yoav
    Brantley, KiantƩ
    Wu, Anne
    Kojima, Noriyuki
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Baselines models for the paper lilGym: Natural Language Visual Reasoning with Reinforcement Learning.

  8. P

    NLVR Dataset

    • paperswithcode.com
    Updated Feb 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alane Suhr; Mike Lewis; James Yeh; Yoav Artzi (2021). NLVR Dataset [Dataset]. https://paperswithcode.com/dataset/nlvr
    Explore at:
    Dataset updated
    Feb 25, 2021
    Authors
    Alane Suhr; Mike Lewis; James Yeh; Yoav Artzi
    Description

    NLVR contains 92,244 pairs of human-written English sentences grounded in synthetic images. Because the images are synthetically generated, this dataset can be used for semantic parsing.

  9. f

    Table_2_Why Can Only 24% Solve Bayesian Reasoning Problems in Natural...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Weber; Karin Binder; Stefan Krauss (2023). Table_2_Why Can Only 24% Solve Bayesian Reasoning Problems in Natural Frequencies: Frequency Phobia in Spite of Probability Blindness.docx [Dataset]. http://doi.org/10.3389/fpsyg.2018.01833.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Patrick Weber; Karin Binder; Stefan Krauss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For more than 20 years, research has proven the beneficial effect of natural frequencies when it comes to solving Bayesian reasoning tasks (Gigerenzer and Hoffrage, 1995). In a recent meta-analysis, McDowell and Jacobs (2017) showed that presenting a task in natural frequency format increases performance rates to 24% compared to only 4% when the same task is presented in probability format. Nevertheless, on average three quarters of participants in their meta-analysis failed to obtain the correct solution for such a task in frequency format. In this paper, we present an empirical study on what participants typically do wrong when confronted with natural frequencies. We found that many of them did not actually use natural frequencies for their calculations, but translated them back into complicated probabilities instead. This switch from the intuitive presentation format to a less intuitive calculation format will be discussed within the framework of psychological theories (e.g., the Einstellung effect).

  10. f

    Data_Sheet_1_Frequency Formats: How Primary School Stochastics Profits From...

    • figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Till; Ute Sproesser (2023). Data_Sheet_1_Frequency Formats: How Primary School Stochastics Profits From Cognitive Psychology.XLSX [Dataset]. http://doi.org/10.3389/feduc.2020.00073.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Christoph Till; Ute Sproesser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cognitive psychology has shown that understanding numerical information is deeply related to the format in which this information is presented; percentages are difficult to grasp whereas frequency formats are intuitively accessible. This plays a vital role in the medical domain where difficult risk-related probability judgments have to be made both by professionals and their patients. In this article, we demonstrate that the idea of representing statistical information in terms of frequency formats is not only helpful for communicating risks, but can be applied to primary school stochastics when percentages and fractions are not available. For this purpose, we report on an intervention study conducted in grade 4 in primary school. The results show, on the one hand, that primary school students could already solve Bayesian reasoning tasks in the pretest when natural frequencies were used. On the other hand, the students profited from the intervention where they used different representations, namely colored tinker cubes and natural frequencies in order to describe and quantify frequencies and probabilities. These results go along with findings from cognitive psychology that activities with hands-on material as well as pointing out to the underlying nested-sets structure can foster Bayesian reasoning. The results are discussed in particular with regard to teaching stochastics in (primary) school.

  11. f

    Table_1_Measuring people’s covariational reasoning in Bayesian...

    • frontiersin.figshare.com
    xlsx
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Steib; Stefan Krauss; Karin Binder; Theresa Büchter; Katharina Bƶcherer-Linder; Andreas Eichler; Markus Vogel (2023). Table_1_Measuring people’s covariational reasoning in Bayesian situations.XLSX [Dataset]. http://doi.org/10.3389/fpsyg.2023.1184370.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicole Steib; Stefan Krauss; Karin Binder; Theresa Büchter; Katharina Böcherer-Linder; Andreas Eichler; Markus Vogel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous research on Bayesian reasoning has typically investigated people’s ability to assess a posterior probability (i.e., a positive predictive value) based on prior knowledge (i.e., base rate, true-positive rate, and false-positive rate). In this article, we systematically examine the extent to which people understand the effects of changes in the three input probabilities on the positive predictive value, that is, covariational reasoning. In this regard, two different operationalizations for measuring covariational reasoning (i.e., by single-choice vs. slider format) are investigated in an empirical study with N = 229 university students. In addition, we aim to answer the question wheter a skill in ā€œconventionalā€ Bayesian reasoning is a prerequisite for covariational reasoning.

  12. Cause-Effect-Context from Natural Questions (NQ-CE)

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Dass; Oktie Hassanzadeh; Oktie Hassanzadeh; Gaurav Dass (2021). Cause-Effect-Context from Natural Questions (NQ-CE) [Dataset]. http://doi.org/10.5281/zenodo.4765390
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gaurav Dass; Oktie Hassanzadeh; Oktie Hassanzadeh; Gaurav Dass
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset is derived from the Natural Questions (NQ) dataset which is a large benchmark for open question answering research (https://ai.google.com/research/NaturalQuestions).

    This dataset contains a collection of cause-effect pairs along with their context (the text describing the causal relation between the cause and the effect) as well as the original question in the NQ data set. It also contains a collection of "negative" pairs, phrases that are mentioned in the context but have no causal relation.

    This dataset is constructed by filtering questions in the NQ dataset that follow a certain pattern indicating that the question is causal. Either the cause or the effect is the (short) answer in the original NQ dataset, and the other side is manually derived from the context.

    The data is shared in JSONL format with every line being a processed NQ question with relevant fields described above. In versions 2 and 2.5, each JSON object has the following fields:

    • phrase1: the first phrase (text span)
    • phrase2: the second phrase (text span)
    • label: "causal" means phrase 1 causes phrase 2, "non_causal" means "phrase1" and "phrase2" do NOT have a causal relation between them
    • passage: the context that states that phrase1 causes phrase2 (for causal) or just the passage that has both phrase1 and phrase2 (for non_causal).
    • document_url: the Wikipedia URL from the Natural Questions data
    • question_text: the original question text from the Natural Questions data

    Versions 1.5 and 2.5 enforce exactness of causes and effects as seen verbatim in the context passage. This is different from version 1 and 2 because there we are interested in how causes and effects are annotated as per human understanding and manual curation, i.e. how a human being would parse and understand causes and effects in text and to enforce simple grammatical consistency in causes and effects. Eg. in a passage, say ... "Above - average sea water temperatures caused by global warming is the leading cause of coral bleaching." Version 1 and 2 could say something like, "above average sea water temperatures" while stating the cause but versions 1.5 and 2.5 will always say, "Above - average sea water temperatures", i.e. the cause and effect will be exactly as seen in text. These versions help in matching results of any causal relation extraction (CRE) algorithm and method better, e.g. obtaining results of CRE using some method X and comparing against what is present as causes and effects. Hence these versions (1.5 and 2.5) are also called Evaluation versions.

    License: https://creativecommons.org/licenses/by-sa/3.0/

    Contacts:
    Gaurav Dass: dassg2 AT rpi.edu
    Oktie Hassanazadeh

  13. d

    Data from: CLEVRER-Humans: Describing physical and causal events the human...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiayuan Mao; Xuelin Yang; Xikun Zhang; Noah Goodman; Jiajun Wu (2023). CLEVRER-Humans: Describing physical and causal events the human way [Dataset]. http://doi.org/10.5061/dryad.5tb2rbp7c
    Explore at:
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jiayuan Mao; Xuelin Yang; Xikun Zhang; Noah Goodman; Jiajun Wu
    Time period covered
    Jan 1, 2022
    Description

    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. ..., We use a three-stage annotation pipeline. The first stage focuses on collecting human-written event descriptions using event cloze tasks, but only for a small number of videos. In the second stage, we augment the data for all videos using neural event description generators trained on the data collected from the first stage. In the third stage, we condense CEGs by collecting binary causal relation labels for all pairs of events from humans.,

  14. E

    Data from: CLadder: Assessing Causal Reasoning in Language Models

    • edmond.mpg.de
    zip
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhijing Jin; Zhijing Jin (2024). CLadder: Assessing Causal Reasoning in Language Models [Dataset]. http://doi.org/10.17617/3.NVRRA9
    Explore at:
    zip(79614276)Available download formats
    Dataset updated
    Jan 7, 2024
    Dataset provided by
    Edmond
    Authors
    Zhijing Jin; Zhijing Jin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Paper: "CLadder: Assessing Causal Reasoning in Language Models" (NeurIPS 2023) by Zhijing Jin*, Yuen Chen*, Felix Leeb*, Luigi Gresele*, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schƶlkopf. (http://arxiv.org/abs/2312.04350) Abstract: The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insight into the causal reasoning abilities of LLMs. Our data is open-sourced at https://github.com/causalNLP/cladder, and our code can be found at https://huggingface.co/datasets/causalnlp/CLadder.

  15. Alpaca GPT-4

    • kaggle.com
    • opendatalab.com
    • +1more
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Alpaca GPT-4 [Dataset]. https://www.kaggle.com/datasets/thedevastator/gpt-4-instruction-following-dataset/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Alpaca GPT-4

    High-Performance NLP for Instruction-Following Reasoning

    By Huggingface Hub [source]

    About this dataset

    This dataset consists of 52K instruction-following data generated by GPT-4 in English using the same prompts as in Alpaca. This data has been crafted specifically to help researchers break ground and explore new strategies for natural language processing, with a special focus on instruction-following reasoning.

    What makes this dataset unique and powerful is that it offers an ample variety of options for experimenting with models that can excel at instruction following tasks; from refining specific components such as predicting outputs or analyzing long textual conversations, to using the entire platform to train and evaluate end-to-end approaches. Allowing researchers the opportunity to rapidly iterate their experiments while having the confidence of a high performant model with few limitations - making this an invaluable resource for anyone looking to push the boundaries of artificial intelligence techniques for logical reasoning problems

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an invaluable resource for researching artificial intelligence approaches to logical reasoning problems. This dataset consists of 52K instruction-following samples generated by GPT-4 in English using the same prompts as in Alpaca. Here are some tips on how to make the most out of this dataset:

    • The columns in this dataset provide essential data that can help researchers evaluate their models on a task involving instruction following: instruction, input, output and text. In order to effectively use this data, it is important for researchers to be familiar with each column and understand its purpose and contribution towards understanding instructional following principles. a) The instruction column provides a statement which an AI model must interpret in order for it complete a task correctly; b) The 'input' column is basically pre-generated data that helps an AI model make sense of the instructions; c) The 'output' column indicates what kind of result must be returned after the AI model interprets instructions correctly; and finally,
      d) The ā€˜text’ column is full text generated by GPT-4 which gives us deeper insight into what gave rise our output results from input & instruction handling.

      Note : It's very important that researchers pay attention to all four columns when overseeing their work on such datasets, as all four components collaborate together integrately.

      To get better results one should consider fine tuning existing schemes so they become better suited for instruction following tasks using these 4 columns as guidance points. It would be also useful if the datasets came with corresponding hyperparameters so users can fine tune them quicker without losing accuracy or any other metric needed on such scenarios!

      Additionally, readers should Oyverviewedthe contextcloserlytoaccuracy assessthepunishmeasure opinion toneandGoforwhichmodeltypebestsuitsitcaseization given before attempting any sort of evaluation since some might bringmore accurateresultsbuttakelongertoprocess ore viceversa!yerinaredaviews satismetricmayvariaentdataobservioletorsalld .yCdgntricular error%mnfreeunerratreated too accommodate certain scenarios better than others but will still depend largely onthedatasetaccuratelyusedtocourubricateperformances026 (269units). For example, if changes are

    Research Ideas

    • Training intelligent conversational agents with instruction-following reasoning capabilities.
    • Developing more complex and powerful instructions processing models driven by natural language understanding and reasoning algorithms.
    • Establishing an online platform to help academic, business or other organizations to construct auto-grading systems for instruction-following skills evaluation of their staff at large scale in a relatively cheap way

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Colu...

  16. P

    Data from: QUITE Dataset

    • paperswithcode.com
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Pierre Schrader; Lukas Lange; Simon Razniewski; Annemarie Friedrich (2024). QUITE Dataset [Dataset]. https://paperswithcode.com/dataset/quite
    Explore at:
    Dataset updated
    Dec 18, 2024
    Authors
    Timo Pierre Schrader; Lukas Lange; Simon Razniewski; Annemarie Friedrich
    Description

    QUITE (Quantifying Uncertainty in natural language Text) is an entirely new benchmark that allows for assessing the capabilities of neural language model-based systems w.r.t. to Bayesian reasoning on a large set of input text that describes probabilistic relationships in natural language text.

  17. S

    SD1K: A Dataset of Challenging Mathematical Problems for Reasoning Large...

    • scidb.cn
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhu Danhao; Huang Fei (2025). SD1K: A Dataset of Challenging Mathematical Problems for Reasoning Large Language Models [Dataset]. http://doi.org/10.57760/sciencedb.j00001.01528
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Zhu Danhao; Huang Fei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains 1,000 entries organized in a structured JSON format. Each entry consists of two fields: instruction and output. The instruction field presents a high-difficulty mathematical problem in natural language, while the output field contains the complete reasoning process generated by a large reasoning model. Specifically, the output includes a think field, which provides a detailed long-chain reasoning solution, and an answer field, which summarizes the final standard answer.

  18. Z

    Data from: ARN: Analogical Reasoning on Narratives

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourati Hassan Zadeh, Zhivar (2024). ARN: Analogical Reasoning on Narratives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11044025
    Explore at:
    Dataset updated
    Apr 23, 2024
    Dataset authored and provided by
    Sourati Hassan Zadeh, Zhivar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As a core cognitive skill that enables the transferability of information across domains, analogical reasoning has been extensively studied for both humans and computational models. However, while cognitive theories of analogy often focus on narratives and study the distinction between surface, relational, and system similarities, existing work in natural language processing has a narrower focus as far as relational analogies between word pairs. This gap brings a natural question: can state-of-the-art large language models (LLMs) detect system analogies between narratives? To gain insight into this question and extend word-based relational analogies to relational system analogies, we devise a comprehensive computational framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings. Leveraging the interplay between these mappings, we create a binary task and benchmark for Analogical Reasoning on Narratives (ARN), covering four categories of far (cross-domain)/near (within-domain) analogies and disanalogies. We show that while all LLMs can largely recognize near analogies, even the largest ones struggle with far analogies in a zero-shot setting, with GPT4.0 scoring below random. Guiding the models through solved examples and chain-of-thought reasoning enhances their analogical reasoning ability. Yet, since even in the few-shot setting, the best model only performs halfway between random and humans, ARN opens exciting directions for computational analogical reasoners.

  19. m

    AI Dialog Software Application

    • data.mendeley.com
    Updated Nov 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francis R Belch (2021). AI Dialog Software Application [Dataset]. http://doi.org/10.17632/zhv2wfnprv.3
    Explore at:
    Dataset updated
    Nov 19, 2021
    Authors
    Francis R Belch
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    The experimental AI Dialog Software Application is able to build a Mental Model from a plain English Dialogue input. Subsequently, Comprehension and Logical Capability may be tested using plain English Queries. AI Dialog is used to experimentally test the validity and utility of a novel software application design, as a path to Conversational Artificial Intelligence. The theory behind AI Dialog is fully described in the book: Belch, Francis R. (2021) Artificial Intelligence That Can Comprehend, Available at: Amazon.com.

    There are also two YouTube lectures each of about 1 hour duration describing a radical new theory of Linguistic Semantics used to implement the AI Dialog Software Application. These are:

    Semantics I - A radical new approach - link address - https://www.youtube.com/watch?v=aTVRp2-9niU&t=209s

    Semantics II - Dialogues & Mental Models - link address - https://www.youtube.com/watch?v=6S9VG9sINmg&t=37s

    This is a download of the executable of the AI Dialog Software Application Version 3.2 Alpha Release. This version supersedes Version 3.1 to allow both U.K. and U.S. spelling in user input.

    The AI Dialog Software is protected by international copyright, but is made available to use for non-commercial personal study purposes.

    The application will run on Windows 10Ā® PC, Laptop and Tablet systems, and requires about 1 M byte.

    The download file is zipped and needs to be unzipped. After this, the content of the folder AI Dialog 3.2 Alpha Release is:

    • Application Files (Folder) • Documentation (Folder) • NLP2016Autumn (Manifest) • Setup (Application)

    In the Documentation folder are two PDF files:

    • Appendix I - Tuition Lessons (PDF) • Appendix II – AI Dialog Specification (PDF)

    The first is a hard copy of the tuition lessons. The second is a specification of a subset of English for use with the AI Dialog system. However, there is no need to consult either of these initially, as AI Dialog incorporates a quick start tuition module.

    To install AI Dialog, double click the Setup file. This starts AI Dialog immediately after installation, but places an application icon on the Windows 10Ā® Start list for restarting later.

    After AI Dialog starts, just follow the pop-up message instructions, which lead to a quick start interactive tuition module, fully describing how to use the application.

  20. h

    synthetic_reasoning_natural

    • huggingface.co
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unlimited Research Group of AI (2023). synthetic_reasoning_natural [Dataset]. https://huggingface.co/datasets/ura-hcmut/synthetic_reasoning_natural
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset authored and provided by
    Unlimited Research Group of AI
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AI at Meta (2025). natural_reasoning [Dataset]. https://huggingface.co/datasets/facebook/natural_reasoning

natural_reasoning

Natural Reasoning

facebook/natural_reasoning

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset authored and provided by
AI at Meta
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.

Search
Clear search
Close search
Google apps
Main menu