Facebook
TwitterDataset Card for reasoning
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/reasoning/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/reasoning.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Natural Reasoning is a large-scale dataset designed for general reasoning tasks. It consists of high-quality, challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The dataset has been carefully deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, and MMLU-STEM.
A 1.1 million subset of the Natural Reasoning dataset is released to the research community to foster the development of strong large language model (LLM) reasoners.
File Format: natural_reasoning.parquet
Click here to view the dataset
CC-BY-NC-4.0 Text Generation Reasoning English (en) 1M < n < 10M Hugging Face You can load the dataset directly from Hugging Face as follows:
from datasets import load_dataset
ds = load_dataset("facebook/natural_reasoning")
The dataset was constructed from the pretraining corpora DCLM and FineMath. The questions have been filtered to remove contamination and duplication from widely-used reasoning benchmarks like MATH, GPQA, MMLU-Pro, and MMLU-STEM. For each question, the dataset provides a reference final answer extracted from the original document when available, and also includes a model-generated response from Llama3.3-70B-Instruct.
In the 1.1 million subset: - 18.29% of the questions do not have a reference answer. - 9.71% of the questions have a single-word answer. - 21.58% of the questions have a short answer. - 50.42% of the questions have a long-form reference answer.
Training on the Natural Reasoning dataset shows superior scaling effects compared to other datasets. When training the Llama3.1-8B-Instruct model, the dataset achieved better performance on average across three key benchmarks: MATH, GPQA, and MMLU-Pro.
https://cdn-uploads.huggingface.co/production/uploads/659a395421a7431643caedda/S6aO-agjRRhc0JLkohZ5z.jpeg" alt="Scaling Curve">
If you use the Natural Reasoning dataset, please cite it with the following BibTeX entry:
@misc{yuan2025naturalreasoningreasoningwild28m,
title={NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions},
author={Weizhe Yuan and Jane Yu and Song Jiang and Karthik Padthe and Yang Li and Dong Wang and Ilia Kulikov and Kyunghyun Cho and Yuandong Tian and Jason E Weston and Xian Li},
year={2025},
eprint={2502.13124},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.13124}
}
Source: Hugging Face
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
reasoning-0.01 subset
synthetic dataset of reasoning chains for a wide variety of tasks. we leverage data like this across multiple reasoning experiments/projects. stay tuned for reasoning models and more data. Thanks to Hive Digital Technologies (https://x.com/HIVEDigitalTech) for their compute support in this project and beyond.
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
reedmayhew/claude-3.7-sonnet-reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Dataset Description
ReMI was introduced in ReMI: A Dataset for Reasoning with Multiple Images. It contains 13 tasks namely: EmojiAlgebra, FuncRead, GeomShape, GeomCost, Collisions, Clocks, Schedule, Charts, CodeEdit, Isomorphism, Maps, RefCOCO, and IQ.
Dataset Usage
Data Downloading
All the data examples were divided into two subsets: train and test.
train: contains 2 examples per task (26 in total) to be used as fewshot examples. test: contains 200 examples… See the full description on the dataset page: https://huggingface.co/datasets/mehrankazemi/ReMI.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
News
[2025/04/22] We split the data and kept only the medical SFT dataset (medical_o1_sft.json). The file medical_o1_sft_mix.json contains a mix of medical and general instruction data. [2025/02/22] We released the distilled dataset from Deepseek-R1 based on medical verifiable problems. You can use it to initialize your models with the reasoning chain from Deepseek-R1. [2024/12/25] We open-sourced the medical reasoning dataset for SFT, built on medical verifiable problems and an LLM… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT.
Facebook
TwitterMME-Reasoning 🔥: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Official repository for "MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs". 🌟 For more details, please refer to the project page. [🚀Project Page] [📖 Paper] [🗃️ Github] [🏆 Leaderboard]
💥 News
[2025.05.23] 🔥 We launch MME-Reasoning, a comprehensive benchmark designed to evaluate the reasoning ability of MLLMs. We release the arxiv paper and all data samples… See the full description on the dataset page: https://huggingface.co/datasets/U4R/MME-Reasoning.
Facebook
TwitterRhushya/synthetic-reasoning-dataset-llama3-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMenlo/Maze-Reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterlccccc-1/Multimodal-Visual-Reasoning-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for my-distiset-986461
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/my-distiset-986461/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/sdiazlor/python-reasoning-dataset.
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
📑 Paper | 🌐 Project Page | 💾 Released Resources | 📦 Repo
This is the resource page of the CodeI/O collection on Huggingface, we highlight your currect position with a blue block. Dataset
Dataset
Link
CodeI/O-PythonEdu-Reasoning
🤗
Please also check the raw data after our processing if you are interested:… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Visual Spatial Reasoning (VSR) corpus is a collection of caption-image pairs with true/false labels. Each caption describes the spatial relation of two individual objects in the image, and a vision-language model (VLM) needs to judge whether the caption is correctly describing the image (True) or not (False).
Facebook
TwitterDataset Card for "livebench/reasoning"
LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:
LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/reasoning.
Facebook
TwitterShuffled mix of:
Large high dataset of quality web text: https://huggingface.co/datasets/EleutherAI/fineweb-edu-dedup-10b Medium dataset of QwQ math reasoning: https://huggingface.co/datasets/PrimeIntellect/NuminaMath-QwQ-CoT-5M Small dataset of DeepSeek-R1 reasoning traces on math, coding, science and puzzle data: https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k
Intended for disentanglement of advanced reasoning models (SAEs, transcoders). Generation code:… See the full description on the dataset page: https://huggingface.co/datasets/EleutherAI/reasoning-mix.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Synthetic reasoning dataset
Original version:
https://huggingface.co/datasets/lighteval/synthetic_reasoning
Translation source code: https://github.com/martinakaduc/ura-llama/tree/main/dataset_scripts/custom_datasets
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
[!NOTE] We have released a paper for OpenThoughts! See our paper here.
Open-Thoughts-114k
Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles! Inspect the content with rich formatting with Curator Viewer.
Available Subsets
default subset containing ready-to-train data used to finetune the OpenThinker-7B and OpenThinker-32B models: ds = load_dataset("open-thoughts/OpenThoughts-114k", split="train")… See the full description on the dataset page: https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
zou-lab/MedCaseReasoning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
PhysReason is accepted by ACL-2025-main
📋 Overview
PhysReason is a comprehensive physics-based reasoning benchmark consisting of 1,200 physics problems spanning multiple domains, with a focus on both knowledge-based (25%) and reasoning-based (75%) questions. This benchmark addresses the critical gap in evaluating large language models' capabilities in physics-based reasoning, which requires… See the full description on the dataset page: https://huggingface.co/datasets/zhibei1204/PhysReason.
Facebook
TwitterDataset Card for reasoning
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/reasoning/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/reasoning.