100+ datasets found

h
natural_reasoning
huggingface.co
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2025). natural_reasoning [Dataset]. https://huggingface.co/datasets/facebook/natural_reasoning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset authored and provided by
AI at Meta
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.
h
natural-science-reasoning
huggingface.co
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila (2025). natural-science-reasoning [Dataset]. https://huggingface.co/datasets/dvilasuero/natural-science-reasoning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2025
Authors
Daniel Vila
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Natural Sciences Reasoning: the "smolest" reasoning dataset

A smol-scale open dataset for reasoning tasks using Hugging Face Inference Endpoints. While intentionally limited in scale, this resource prioritizes:

Reproducible pipeline for reasoning tasks using a variety of models (Deepseek V3, Deepsek-R1, Llama70B-Instruct, etc.)

Knowledge sharing for domains other than Math and Code reasoning

In this repo, you can find:

The prompts and the pipeline (see the config file). The… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/natural-science-reasoning.
h
facebook-natural-reasoning-curriculum
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Kyle Stone (2025). facebook-natural-reasoning-curriculum [Dataset]. https://huggingface.co/datasets/essobi/facebook-natural-reasoning-curriculum
Explore at:
Dataset updated
May 29, 2025
Authors
The Kyle Stone
Description
Facebook Natural Reasoning - Curriculum Learning

This dataset is a curriculum learning version of the Facebook Natural Reasoning dataset, automatically split into four complexity levels:

Splits

simple_reasoning: 28898 examples - Basic arithmetic, one-step problems basic_reasoning: 713776 examples - Chained operations, single-variable algebra intermediate_reasoning: 397706 examples - Proportional reasoning, multi-step problems
complex_reasoning: 5444 examples -… See the full description on the dataset page: https://huggingface.co/datasets/essobi/facebook-natural-reasoning-curriculum.
P
DR.BENCH Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanjun Gao; Dmitriy Dligach; Timothy Miller; John Caskey; Brihat Sharma; Matthew M Churpek; Majid Afshar, DR.BENCH Dataset [Dataset]. https://paperswithcode.com/dataset/dr-bench
Explore at:
Authors
Yanjun Gao; Dmitriy Dligach; Timothy Miller; John Caskey; Brihat Sharma; Matthew M Churpek; Majid Afshar
Description
DR.BENCH is a dataset for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation.
h
Natural-Reasoning-STEM-25K
huggingface.co
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
qingyang zhang (2025). Natural-Reasoning-STEM-25K [Dataset]. https://huggingface.co/datasets/qingyangzhang/Natural-Reasoning-STEM-25K
Explore at:
Dataset updated
Jul 9, 2025
Authors
qingyang zhang
Description
qingyangzhang/Natural-Reasoning-STEM-25K dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Source Code, Data and Additional Material for the Thesis: "Social...
heidata.uni-heidelberg.de
bin, csv, json, pdf +6
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Debjit Paul; Debjit Paul (2023). Source Code, Data and Additional Material for the Thesis: "Social Commonsense Reasoning with Structured Knowledge in Text" [Dataset]. http://doi.org/10.11588/DATA/C56QUV
Explore at:
text/x-python(2875), bin(1401), text/x-python(11363), text/x-python(1641), bin(3307), bin(3069), bin(9027), txt(15454887), text/markdown(2055), text/x-python(8994), text/x-python(383), text/x-python(526), text/x-python(2490), txt(5636731), text/markdown(1640), text/x-python(41174), text/x-python(5424), text/x-python(4495), pdf(382267), bin(3327), text/x-python(2546), text/x-python(77394), sh(279), text/x-python(8313), text/x-python(43556), text/x-python(31067), text/x-python(30089), text/markdown(208), bin(1854), text/x-python(12556), bin(864812), text/x-python(2602), txt(1093121), text/markdown(1448), text/x-python(37853), bin(1110), bin(7727), text/x-python(2477), bin(2121), text/x-python(1064), text/x-python(35065), bin(2893), text/x-python(16644), sh(420), bin(5812), txt(1068863), text/x-python(4913), text/x-python(14890), text/x-python(45885), bin(7419), text/x-python(9342), bin(2129), bin(6118), json(1442), bin(3206), text/markdown(794), bin(5951), bin(52269), bin(1592), bin(3117), text/x-python(9828), text/x-python(2251), bin(11219), text/x-python(6994), text/markdown(1639), txt(420), text/x-python(9125), text/x-python(2658), text/x-python(13363), text/x-python(3102), text/x-python(11326), text/x-python(22446), text/x-python(3146), bin(1937), text/x-python(1359), text/markdown(1167), bin(1955), png(95078), text/x-python(2584), txt(392), bin(6836), bin(2822), bin(11922), text/x-python(5769), json(235), text/markdown(1), txt(18462763), csv(2395105), text/x-python(34564), text/x-python(7667), bin(1991), text/x-python(12518), bin(3737), bin(2027), text/x-python(31980), text/markdown(538), text/x-python(9714), text/x-python(10440), text/x-python(1907), sh(190), text/x-python(31222), text/x-python(8302), text/x-python(4144), bin(11564), pdf(36295), text/x-python(2957), text/x-python(4298), txt(939), text/x-python(21501), text/x-python(34392), text/x-python(2970), text/x-python(13823), txt(22844886), sh(208), text/x-python(3464), text/x-python(8478), bin(507), text/x-python(82119), text/x-python(2575), text/x-python(4238), text/x-python(6500), pdf(12085664), bin(3140), text/x-python(4876), text/x-python(59562), text/x-python(1578), txt(34348), text/x-python(1871), text/x-python(16490), png(338291), bin(4275), bin(2649), bin(2553), text/x-python(39776), txt(1512), text/x-python(34414), text/x-python(2139), text/x-python(7025), text/x-python(74071), json(991), text/x-python(3006), txt(4865485), txt(227), text/x-python(10384), bin(2578), bin(1902), bin(2997), text/x-python(7085), pdf(461423), text/x-python(23917), text/x-python(37708), csv(2790411), text/x-python(7702), text/x-python(1184), text/x-python(8467), pdf(8367047), text/x-python(24459), text/x-python(3490), json(576), bin(3596), bin(2968), text/x-python(3662), json(234), text/x-python(14229), text/x-python(1530), json(633), text/x-python(4487), bin(3295), txt(1139035), bin(1946), text/x-python(1964), text/x-python(4517), bin(1136), bin(7688), json(1440), text/x-python(34454), text/markdown(1700), text/x-python(1821), text/x-python(6710), pdf(6149907), text/x-python(21261), text/x-python(2659), text/x-python(6801), text/x-python(1729), text/x-python(2157), text/x-python(11202), bin(1342), text/x-python(2523), bin(12504), bin(1643), text/x-python(37626), text/x-python(72409), text/x-python(6499), bin(9191), text/x-python(12280), text/x-python(3685), text/x-python(12844), sh(129), text/x-python(1908), text/x-python(5656), text/x-python(5009), sh(98), text/x-python(31054), text/x-python(40857), text/x-python(17870), bin(10636), bin(25272), text/x-python(12880), text/x-python(1393), bin(1378), sh(1587), bin(9083), text/x-python(7934), text/x-python(38448), bin(3332), bin(6375), text/markdown(1692), text/x-python(399), bin(2827), bin(2651), text/x-python(5756), text/x-python(8658), bin(7463), text/x-python(8013), bin(27888), json(1444), sh(1348), text/x-python(12602), text/x-python(33616), text/x-python(12571), txt(15889964), text/x-python(18762), text/x-python(7999), text/x-python(2202), bin(7417), text/x-python(59493), text/x-python(8101), txt(1009), text/x-python(11431), text/x-python(55244), text/x-python(4115), text/markdown(886), text/x-python(12206), bin(11543), text/x-python(3972), png(405669), txt(23196928), text/markdown(487), pdf(119150), bin(10038), zip(13480477), bin(2998), text/x-python(10326), text/x-python(17712), text/x-python(3747), text/markdown(4708), text/x-python(910), text/x-python(9914), text/x-python(74046), text/markdown(7578), text/x-python(14169), png(194187), text/x-python(10327), text/x-python(2507), bin(12582), text/x-python(38942), bin(9973), bin(1165), json(571), bin(9105), text/x-python(7273), text/x-python(3137), text/x-python(56818), text/x-python(38643), text/x-python(10219), text/x-python(53535), txt(15185565), text/x-python(7918), text/x-python(10979), txt(20964772), text/x-python(2313), text/markdown(184), text/x-python(4669), text/x-python(46665), text/x-python(2185), sh(338), text/x-python(10062), bin(4552), bin(52004), text/x-python(25228), text/x-python(800), text/x-python(12950), bin(10745), bin(3582), json(1679), text/x-python(29838), bin(561), text/x-python(9509), text/x-python(2259), bin(1729), bin(25130), text/x-python(2459), text/x-python(7794), text/markdown(281), bin(10649), bin(27713), text/x-python(45206), text/x-python(13582), sh(433), bin(2758), bin(4103), text/x-python(5878), text/x-python(41332), text/x-python(9464), txt(14595080), text/x-python(4143), bin(3341), text/x-python(10530), text/markdown(388), text/markdown(793), text/x-python(10180), text/x-python(3287), sh(1568), bin(6704), txt(1065385), text/x-python(2831), bin(10047), text/x-python(2100), text/x-python(13157), bin(525), text/x-python(2641), text/x-python(4472), text/x-python(6840), text/x-python(22263)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/C56QUV
Dataset updated
Jul 4, 2023
Dataset provided by
heiDATA
Authors
Debjit Paul; Debjit Paul
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/C56QUVhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/C56QUV
Description
Understanding a social situation requires the ability to reason about the underlying emotions and behaviour of others. For example, when we read a personal story, we use our prior commonsense knowledge and social intelligence to infer the emotions, motives, and anticipate the actions of the characters in a story. For machines to understand text related to \textit{personal stories and social conversations}, they must be able to make commonsense inferences. While most people can reason deeply about the social implications of the text, it is challenging for natural language processing systems as these implications are often subtle and implicit. This dissertation argues that NLP systems must learn to reason more explicitly about the underlying social knowledge in text to perform social commonsense reasoning. We divide the above argument into two sub-problems: (i) understanding the underlying social knowledge and (ii) explicitly reasoning about such knowledge for social commonsense reasoning. To address these problems, we propose building NLP systems that integrate neural network based learning with structured knowledge representations. In the first part of this dissertation, we study the role of structured commonsense knowledge in understanding the social dynamics of characters and their actions in stories. Our motivation behind enriching the model with structured commonsense knowledge is to bridge the gap between surface meanings of texts and the underlying social implication of each event in the stories. We develop a novel model that incorporates commonsense knowledge into neural models and showcases the importance of commonsense knowledge in understanding social dynamics of story characters. Further, we investigate the role of temporal dynamics of story events in understanding social situations. We develop a model that can explicitly learn about \textit{what social event follows another event} from personal narrative stories. We demonstrate that \textit{implicitly} leveraging such temporal knowledge about story events can support social commonsense reasoning tasks. In the second part of this dissertation, we investigate methods to explicitly reason about the knowledge related to social dynamics of characters (behavior, mental states) and cause/effect of social events. We propose a novel model named as \textit{multi-head knowledge attention} that incorporates such social knowledge into state-of-the-art neural NLP models to address two complex commonsense inference tasks. We demonstrate that our method of incorporating knowledge can improve -- (i) the robustness and the interpretability of the model and (ii) the overall performance of the model compared to other knowledge integration methods. We also aim to investigate social commonsense reasoning as a natural language generation task. We design a story completion task that requires natural language generation models to perform both forward and backward reasoning. We study the role of contextualized commonsense knowledge in natural language generation tasks. We propose a model that jointly learns to generate contextualized inference rules as well as narrative stories. We demonstrate that our model can outperform state-of-the-art non-contextualized commonsense knowledge-based generation models. We hope that the research presented in this dissertation will open up interesting scopes for future research involving social commonsense reasoning and other related topics.
Z
lilGym: Natural Language Visual Reasoning with Reinforcement Learning, model...
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kojima, Noriyuki (2023). lilGym: Natural Language Visual Reasoning with Reinforcement Learning, model files [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_8128779
Explore at:
Dataset updated
Jul 12, 2023
Dataset provided by
Brantley, Kianté
Artzi, Yoav
Wu, Anne
Kojima, Noriyuki
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Baselines models for the paper lilGym: Natural Language Visual Reasoning with Reinforcement Learning.
P
NLVR Dataset
paperswithcode.com
Updated Feb 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alane Suhr; Mike Lewis; James Yeh; Yoav Artzi (2021). NLVR Dataset [Dataset]. https://paperswithcode.com/dataset/nlvr
Explore at:
Dataset updated
Feb 25, 2021
Authors
Alane Suhr; Mike Lewis; James Yeh; Yoav Artzi
Description
NLVR contains 92,244 pairs of human-written English sentences grounded in synthetic images. Because the images are synthetically generated, this dataset can be used for semantic parsing.
f
Data_Sheet_1_Frequency Formats: How Primary School Stochastics Profits From...
figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Till; Ute Sproesser (2023). Data_Sheet_1_Frequency Formats: How Primary School Stochastics Profits From Cognitive Psychology.XLSX [Dataset]. http://doi.org/10.3389/feduc.2020.00073.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2020.00073.s001
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Christoph Till; Ute Sproesser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cognitive psychology has shown that understanding numerical information is deeply related to the format in which this information is presented; percentages are difficult to grasp whereas frequency formats are intuitively accessible. This plays a vital role in the medical domain where difficult risk-related probability judgments have to be made both by professionals and their patients. In this article, we demonstrate that the idea of representing statistical information in terms of frequency formats is not only helpful for communicating risks, but can be applied to primary school stochastics when percentages and fractions are not available. For this purpose, we report on an intervention study conducted in grade 4 in primary school. The results show, on the one hand, that primary school students could already solve Bayesian reasoning tasks in the pretest when natural frequencies were used. On the other hand, the students profited from the intervention where they used different representations, namely colored tinker cubes and natural frequencies in order to describe and quantify frequencies and probabilities. These results go along with findings from cognitive psychology that activities with hands-on material as well as pointing out to the underlying nested-sets structure can foster Bayesian reasoning. The results are discussed in particular with regard to teaching stochastics in (primary) school.
f
Data_Sheet_1_Measuring people’s covariational reasoning in Bayesian...
frontiersin.figshare.com
pdf
Updated Oct 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Steib; Stefan Krauss; Karin Binder; Theresa Büchter; Katharina Böcherer-Linder; Andreas Eichler; Markus Vogel (2023). Data_Sheet_1_Measuring people’s covariational reasoning in Bayesian situations.PDF [Dataset]. http://doi.org/10.3389/fpsyg.2023.1184370.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2023.1184370.s001
Dataset updated
Oct 16, 2023
Dataset provided by
Frontiers
Authors
Nicole Steib; Stefan Krauss; Karin Binder; Theresa Büchter; Katharina Böcherer-Linder; Andreas Eichler; Markus Vogel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Previous research on Bayesian reasoning has typically investigated people’s ability to assess a posterior probability (i.e., a positive predictive value) based on prior knowledge (i.e., base rate, true-positive rate, and false-positive rate). In this article, we systematically examine the extent to which people understand the effects of changes in the three input probabilities on the positive predictive value, that is, covariational reasoning. In this regard, two different operationalizations for measuring covariational reasoning (i.e., by single-choice vs. slider format) are investigated in an empirical study with N = 229 university students. In addition, we aim to answer the question wheter a skill in “conventional” Bayesian reasoning is a prerequisite for covariational reasoning.
Cause-Effect-Context from Natural Questions (NQ-CE)
zenodo.org
explore.openaire.eu
bin
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dass; Oktie Hassanzadeh; Oktie Hassanzadeh; Gaurav Dass (2021). Cause-Effect-Context from Natural Questions (NQ-CE) [Dataset]. http://doi.org/10.5281/zenodo.4765390
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4765390
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gaurav Dass; Oktie Hassanzadeh; Oktie Hassanzadeh; Gaurav Dass
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset is derived from the Natural Questions (NQ) dataset which is a large benchmark for open question answering research (https://ai.google.com/research/NaturalQuestions).

This dataset contains a collection of cause-effect pairs along with their context (the text describing the causal relation between the cause and the effect) as well as the original question in the NQ data set. It also contains a collection of "negative" pairs, phrases that are mentioned in the context but have no causal relation.

This dataset is constructed by filtering questions in the NQ dataset that follow a certain pattern indicating that the question is causal. Either the cause or the effect is the (short) answer in the original NQ dataset, and the other side is manually derived from the context.

The data is shared in JSONL format with every line being a processed NQ question with relevant fields described above. In versions 2 and 2.5, each JSON object has the following fields:

phrase1: the first phrase (text span)

phrase2: the second phrase (text span)

label: "causal" means phrase 1 causes phrase 2, "non_causal" means "phrase1" and "phrase2" do NOT have a causal relation between them

passage: the context that states that phrase1 causes phrase2 (for causal) or just the passage that has both phrase1 and phrase2 (for non_causal).

document_url: the Wikipedia URL from the Natural Questions data

question_text: the original question text from the Natural Questions data

Versions 1.5 and 2.5 enforce exactness of causes and effects as seen verbatim in the context passage. This is different from version 1 and 2 because there we are interested in how causes and effects are annotated as per human understanding and manual curation, i.e. how a human being would parse and understand causes and effects in text and to enforce simple grammatical consistency in causes and effects. Eg. in a passage, say ... "Above - average sea water temperatures caused by global warming is the leading cause of coral bleaching." Version 1 and 2 could say something like, "above average sea water temperatures" while stating the cause but versions 1.5 and 2.5 will always say, "Above - average sea water temperatures", i.e. the cause and effect will be exactly as seen in text. These versions help in matching results of any causal relation extraction (CRE) algorithm and method better, e.g. obtaining results of CRE using some method X and comparing against what is present as causes and effects. Hence these versions (1.5 and 2.5) are also called Evaluation versions.

License: https://creativecommons.org/licenses/by-sa/3.0/

Contacts:
Gaurav Dass: dassg2 AT rpi.edu
Oktie Hassanazadeh
d
Data from: CLEVRER-Humans: Describing physical and causal events the human...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiayuan Mao; Xuelin Yang; Xikun Zhang; Noah Goodman; Jiajun Wu (2023). CLEVRER-Humans: Describing physical and causal events the human way [Dataset]. http://doi.org/10.5061/dryad.5tb2rbp7c
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.5tb2rbp7c
Dataset updated
Nov 30, 2023
Dataset provided by
Dryad Digital Repository
Authors
Jiayuan Mao; Xuelin Yang; Xikun Zhang; Noah Goodman; Jiajun Wu
Time period covered
Jan 1, 2022
Description
Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. ..., We use a three-stage annotation pipeline. The first stage focuses on collecting human-written event descriptions using event cloze tasks, but only for a small number of videos. In the second stage, we augment the data for all videos using neural event description generators trained on the data collected from the first stage. In the third stage, we condense CEGs by collecting binary causal relation labels for all pairs of events from humans.,
E
Data from: CLadder: Assessing Causal Reasoning in Language Models
edmond.mpg.de
zip
Updated Jan 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhijing Jin; Zhijing Jin (2024). CLadder: Assessing Causal Reasoning in Language Models [Dataset]. http://doi.org/10.17617/3.NVRRA9
Explore at:
zip(79614276)Available download formats
Unique identifier
https://doi.org/10.17617/3.NVRRA9
Dataset updated
Jan 7, 2024
Dataset provided by
Edmond
Authors
Zhijing Jin; Zhijing Jin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Paper: "CLadder: Assessing Causal Reasoning in Language Models" (NeurIPS 2023) by Zhijing Jin*, Yuen Chen*, Felix Leeb*, Luigi Gresele*, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schölkopf. (http://arxiv.org/abs/2312.04350) Abstract: The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insight into the causal reasoning abilities of LLMs. Our data is open-sourced at https://github.com/causalNLP/cladder, and our code can be found at https://huggingface.co/datasets/causalnlp/CLadder.
Alpaca GPT-4
kaggle.com
opendatalab.com
+1more
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Alpaca GPT-4 [Dataset]. https://www.kaggle.com/datasets/thedevastator/gpt-4-instruction-following-dataset/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Alpaca GPT-4

High-Performance NLP for Instruction-Following Reasoning

By Huggingface Hub [source]

About this dataset

This dataset consists of 52K instruction-following data generated by GPT-4 in English using the same prompts as in Alpaca. This data has been crafted specifically to help researchers break ground and explore new strategies for natural language processing, with a special focus on instruction-following reasoning.

What makes this dataset unique and powerful is that it offers an ample variety of options for experimenting with models that can excel at instruction following tasks; from refining specific components such as predicting outputs or analyzing long textual conversations, to using the entire platform to train and evaluate end-to-end approaches. Allowing researchers the opportunity to rapidly iterate their experiments while having the confidence of a high performant model with few limitations - making this an invaluable resource for anyone looking to push the boundaries of artificial intelligence techniques for logical reasoning problems

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is an invaluable resource for researching artificial intelligence approaches to logical reasoning problems. This dataset consists of 52K instruction-following samples generated by GPT-4 in English using the same prompts as in Alpaca. Here are some tips on how to make the most out of this dataset:

The columns in this dataset provide essential data that can help researchers evaluate their models on a task involving instruction following: instruction, input, output and text. In order to effectively use this data, it is important for researchers to be familiar with each column and understand its purpose and contribution towards understanding instructional following principles. a) The instruction column provides a statement which an AI model must interpret in order for it complete a task correctly; b) The 'input' column is basically pre-generated data that helps an AI model make sense of the instructions; c) The 'output' column indicates what kind of result must be returned after the AI model interprets instructions correctly; and finally,
d) The ‘text’ column is full text generated by GPT-4 which gives us deeper insight into what gave rise our output results from input & instruction handling.

Note : It's very important that researchers pay attention to all four columns when overseeing their work on such datasets, as all four components collaborate together integrately.

To get better results one should consider fine tuning existing schemes so they become better suited for instruction following tasks using these 4 columns as guidance points. It would be also useful if the datasets came with corresponding hyperparameters so users can fine tune them quicker without losing accuracy or any other metric needed on such scenarios!

Additionally, readers should Oyverviewedthe contextcloserlytoaccuracy assessthepunishmeasure opinion toneandGoforwhichmodeltypebestsuitsitcaseization given before attempting any sort of evaluation since some might bringmore accurateresultsbuttakelongertoprocess ore viceversa!yerinaredaviews satismetricmayvariaentdataobservioletorsalld .yCdgntricular error%mnfreeunerratreated too accommodate certain scenarios better than others but will still depend largely onthedatasetaccuratelyusedtocourubricateperformances026 (269units). For example, if changes are

Research Ideas

Training intelligent conversational agents with instruction-following reasoning capabilities.

Developing more complex and powerful instructions processing models driven by natural language understanding and reasoning algorithms.

Establishing an online platform to help academic, business or other organizations to construct auto-grading systems for instruction-following skills evaluation of their staff at large scale in a relatively cheap way

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Colu...
P
Data from: QUITE Dataset
paperswithcode.com
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Pierre Schrader; Lukas Lange; Simon Razniewski; Annemarie Friedrich (2024). QUITE Dataset [Dataset]. https://paperswithcode.com/dataset/quite
Explore at:
Dataset updated
Dec 18, 2024
Authors
Timo Pierre Schrader; Lukas Lange; Simon Razniewski; Annemarie Friedrich
Description
QUITE (Quantifying Uncertainty in natural language Text) is an entirely new benchmark that allows for assessing the capabilities of neural language model-based systems w.r.t. to Bayesian reasoning on a large set of input text that describes probabilistic relationships in natural language text.
S
SD1K: A Dataset of Challenging Mathematical Problems for Reasoning Large...
scidb.cn
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhu Danhao; Huang Fei (2025). SD1K: A Dataset of Challenging Mathematical Problems for Reasoning Large Language Models [Dataset]. http://doi.org/10.57760/sciencedb.j00001.01528
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00001.01528
Dataset updated
Jul 1, 2025
Dataset provided by
Science Data Bank
Authors
Zhu Danhao; Huang Fei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains 1,000 entries organized in a structured JSON format. Each entry consists of two fields: instruction and output. The instruction field presents a high-difficulty mathematical problem in natural language, while the output field contains the complete reasoning process generated by a large reasoning model. Specifically, the output includes a think field, which provides a detailed long-chain reasoning solution, and an answer field, which summarizes the final standard answer.
m
AI Dialog Software Application
data.mendeley.com
Updated Nov 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francis R Belch (2021). AI Dialog Software Application [Dataset]. http://doi.org/10.17632/zhv2wfnprv.3
Explore at:
Unique identifier
https://doi.org/10.17632/zhv2wfnprv.3
Dataset updated
Nov 19, 2021
Authors
Francis R Belch
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
The experimental AI Dialog Software Application is able to build a Mental Model from a plain English Dialogue input. Subsequently, Comprehension and Logical Capability may be tested using plain English Queries. AI Dialog is used to experimentally test the validity and utility of a novel software application design, as a path to Conversational Artificial Intelligence. The theory behind AI Dialog is fully described in the book: Belch, Francis R. (2021) Artificial Intelligence That Can Comprehend, Available at: Amazon.com.

There are also two YouTube lectures each of about 1 hour duration describing a radical new theory of Linguistic Semantics used to implement the AI Dialog Software Application. These are:

Semantics I - A radical new approach - link address - https://www.youtube.com/watch?v=aTVRp2-9niU&t=209s

Semantics II - Dialogues & Mental Models - link address - https://www.youtube.com/watch?v=6S9VG9sINmg&t=37s

This is a download of the executable of the AI Dialog Software Application Version 3.2 Alpha Release. This version supersedes Version 3.1 to allow both U.K. and U.S. spelling in user input.

The AI Dialog Software is protected by international copyright, but is made available to use for non-commercial personal study purposes.

The application will run on Windows 10® PC, Laptop and Tablet systems, and requires about 1 M byte.

The download file is zipped and needs to be unzipped. After this, the content of the folder AI Dialog 3.2 Alpha Release is:

• Application Files (Folder) • Documentation (Folder) • NLP2016Autumn (Manifest) • Setup (Application)

In the Documentation folder are two PDF files:

• Appendix I - Tuition Lessons (PDF) • Appendix II – AI Dialog Specification (PDF)

The first is a hard copy of the tuition lessons. The second is a specification of a subset of English for use with the AI Dialog system. However, there is no need to consult either of these initially, as AI Dialog incorporates a quick start tuition module.

To install AI Dialog, double click the Setup file. This starts AI Dialog immediately after installation, but places an application icon on the Windows 10® Start list for restarting later.

After AI Dialog starts, just follow the pop-up message instructions, which lead to a quick start interactive tuition module, fully describing how to use the application.
m
SOMADHAN: A Bengali Math Word Problem Dataset
data.mendeley.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bidyarthi Paul (2025). SOMADHAN: A Bengali Math Word Problem Dataset [Dataset]. http://doi.org/10.17632/34bs5cxk9j.1
Explore at:
Unique identifier
https://doi.org/10.17632/34bs5cxk9j.1
Dataset updated
May 27, 2025
Authors
Bidyarthi Paul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SOMADHAN is a Bengali Math Word Problem (MWP) dataset specifically designed to facilitate research in multilingual and low-resource natural language processing tasks, particularly in mathematical reasoning. This dataset consists of 4,000 Bengali complex math word problems, each accompanied by detailed, step-by-step solutions that mimic the logical flow of human problem-solving. The development of SOMADHAN was inspired by and based on the structure of the GSM8K dataset, a widely-used English-language benchmark for mathematical reasoning tasks. To ensure a robust and comprehensive dataset, we utilized the train and test portions of the publicly available GSM8K dataset, which together contain 8,792 English math problems with solutions.
h
synthetic_reasoning_natural
huggingface.co
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unlimited Research Group of AI (2023). synthetic_reasoning_natural [Dataset]. https://huggingface.co/datasets/ura-hcmut/synthetic_reasoning_natural
Explore at:
Dataset updated
Oct 18, 2023
Dataset authored and provided by
Unlimited Research Group of AI
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Synthetic reasoning dataset

Original version:

https://huggingface.co/datasets/lighteval/synthetic_reasoning_natural

Translation source code: https://github.com/martinakaduc/ura-llama/tree/main/dataset_scripts/custom_datasets
Z
Data from: ARN: Analogical Reasoning on Narratives
data.niaid.nih.gov
zenodo.org
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourati Hassan Zadeh, Zhivar (2024). ARN: Analogical Reasoning on Narratives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11044025
Explore at:
Dataset updated
Apr 23, 2024
Dataset authored and provided by
Sourati Hassan Zadeh, Zhivar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As a core cognitive skill that enables the transferability of information across domains, analogical reasoning has been extensively studied for both humans and computational models. However, while cognitive theories of analogy often focus on narratives and study the distinction between surface, relational, and system similarities, existing work in natural language processing has a narrower focus as far as relational analogies between word pairs. This gap brings a natural question: can state-of-the-art large language models (LLMs) detect system analogies between narratives? To gain insight into this question and extend word-based relational analogies to relational system analogies, we devise a comprehensive computational framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings. Leveraging the interplay between these mappings, we create a binary task and benchmark for Analogical Reasoning on Narratives (ARN), covering four categories of far (cross-domain)/near (within-domain) analogies and disanalogies. We show that while all LLMs can largely recognize near analogies, even the largest ones struggle with far analogies in a zero-shot setting, with GPT4.0 scoring below random. Guiding the models through solved examples and chain-of-thought reasoning enhances their analogical reasoning ability. Yet, since even in the few-shot setting, the best model only performs halfway between random and humans, ARN opens exciting directions for computational analogical reasoners.

Facebook

Twitter

Click to copy link

Link copied

Cite

AI at Meta (2025). natural_reasoning [Dataset]. https://huggingface.co/datasets/facebook/natural_reasoning

natural_reasoning

Natural Reasoning

facebook/natural_reasoning

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 19, 2025

Dataset authored and provided by

AI at Meta

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.

Clear search

Close search

Google apps

Main menu

natural_reasoning

natural-science-reasoning

facebook-natural-reasoning-curriculum

DR.BENCH Dataset

Natural-Reasoning-STEM-25K

Source Code, Data and Additional Material for the Thesis: "Social...

lilGym: Natural Language Visual Reasoning with Reinforcement Learning, model...

NLVR Dataset

Data_Sheet_1_Frequency Formats: How Primary School Stochastics Profits From...

Data_Sheet_1_Measuring people’s covariational reasoning in Bayesian...

Cause-Effect-Context from Natural Questions (NQ-CE)

Data from: CLEVRER-Humans: Describing physical and causal events the human...

Data from: CLadder: Assessing Causal Reasoning in Language Models

Alpaca GPT-4

Alpaca GPT-4

High-Performance NLP for Instruction-Following Reasoning

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data from: QUITE Dataset

SD1K: A Dataset of Challenging Mathematical Problems for Reasoning Large...

AI Dialog Software Application

SOMADHAN: A Bengali Math Word Problem Dataset

synthetic_reasoning_natural

Data from: ARN: Analogical Reasoning on Narratives

natural_reasoningSee More Versions

Natural Reasoning

facebook/natural_reasoning

natural_reasoning