100+ datasets found
  1. h

    reasoning

    • huggingface.co
    Updated Apr 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2024). reasoning [Dataset]. https://huggingface.co/datasets/dvilasuero/reasoning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2024
    Authors
    Daniel Vila
    Description

    Dataset Card for reasoning

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/reasoning/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/reasoning.

  2. h

    natural_reasoning

    • huggingface.co
    • openaigptbot.com
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2025). natural_reasoning [Dataset]. https://huggingface.co/datasets/facebook/natural_reasoning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Dataset authored and provided by
    AI at Meta
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    NaturalReasoning is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from… See the full description on the dataset page: https://huggingface.co/datasets/facebook/natural_reasoning.

  3. facebook/natural_reasoning

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zehra Korkusuz (2025). facebook/natural_reasoning [Dataset]. https://www.kaggle.com/datasets/zehrakorkusuz/natural-reasoning
    Explore at:
    zip(1694591016 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    Zehra Korkusuz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Natural Reasoning Dataset

    Source: Huggingface

    Dataset Overview

    Natural Reasoning is a large-scale dataset designed for general reasoning tasks. It consists of high-quality, challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The dataset has been carefully deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, and MMLU-STEM.

    A 1.1 million subset of the Natural Reasoning dataset is released to the research community to foster the development of strong large language model (LLM) reasoners.

    Dataset Information

    File Format: natural_reasoning.parquet

    Click here to view the dataset

    How to Use

    You can load the dataset directly from Hugging Face as follows:

    from datasets import load_dataset
    
    ds = load_dataset("facebook/natural_reasoning")
    

    Data Collection and Quality

    The dataset was constructed from the pretraining corpora DCLM and FineMath. The questions have been filtered to remove contamination and duplication from widely-used reasoning benchmarks like MATH, GPQA, MMLU-Pro, and MMLU-STEM. For each question, the dataset provides a reference final answer extracted from the original document when available, and also includes a model-generated response from Llama3.3-70B-Instruct.

    Reference Answer Statistics

    In the 1.1 million subset: - 18.29% of the questions do not have a reference answer. - 9.71% of the questions have a single-word answer. - 21.58% of the questions have a short answer. - 50.42% of the questions have a long-form reference answer.

    Scaling Curve Performance

    Training on the Natural Reasoning dataset shows superior scaling effects compared to other datasets. When training the Llama3.1-8B-Instruct model, the dataset achieved better performance on average across three key benchmarks: MATH, GPQA, and MMLU-Pro.

    https://cdn-uploads.huggingface.co/production/uploads/659a395421a7431643caedda/S6aO-agjRRhc0JLkohZ5z.jpeg" alt="Scaling Curve">

    Citation

    If you use the Natural Reasoning dataset, please cite it with the following BibTeX entry:

    @misc{yuan2025naturalreasoningreasoningwild28m,
       title={NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions},
       author={Weizhe Yuan and Jane Yu and Song Jiang and Karthik Padthe and Yang Li and Dong Wang and Ilia Kulikov and Kyunghyun Cho and Yuandong Tian and Jason E Weston and Xian Li},
       year={2025},
       eprint={2502.13124},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2502.13124}
    }
    

    Source: Hugging Face

  4. h

    reasoning-0.01

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SkunkworksAI, reasoning-0.01 [Dataset]. https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    SkunkworksAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    reasoning-0.01 subset

    synthetic dataset of reasoning chains for a wide variety of tasks. we leverage data like this across multiple reasoning experiments/projects. stay tuned for reasoning models and more data. Thanks to Hive Digital Technologies (https://x.com/HIVEDigitalTech) for their compute support in this project and beyond.

  5. h

    claude-3.7-sonnet-reasoning

    • huggingface.co
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reed Mayhew (2025). claude-3.7-sonnet-reasoning [Dataset]. https://huggingface.co/datasets/reedmayhew/claude-3.7-sonnet-reasoning
    Explore at:
    Dataset updated
    Mar 11, 2025
    Authors
    Reed Mayhew
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    reedmayhew/claude-3.7-sonnet-reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    ReMI

    • huggingface.co
    Updated Jun 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehran Kazemi (2024). ReMI [Dataset]. https://huggingface.co/datasets/mehrankazemi/ReMI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2024
    Authors
    Mehran Kazemi
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Dataset Description

    ReMI was introduced in ReMI: A Dataset for Reasoning with Multiple Images. It contains 13 tasks namely: EmojiAlgebra, FuncRead, GeomShape, GeomCost, Collisions, Clocks, Schedule, Charts, CodeEdit, Isomorphism, Maps, RefCOCO, and IQ.

      Dataset Usage
    
    
    
    
    
      Data Downloading
    

    All the data examples were divided into two subsets: train and test.

    train: contains 2 examples per task (26 in total) to be used as fewshot examples. test: contains 200 examples… See the full description on the dataset page: https://huggingface.co/datasets/mehrankazemi/ReMI.

  7. h

    medical-o1-reasoning-SFT

    • huggingface.co
    Updated Apr 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2025). medical-o1-reasoning-SFT [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT
    Explore at:
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    FreedomAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    News

    [2025/04/22] We split the data and kept only the medical SFT dataset (medical_o1_sft.json). The file medical_o1_sft_mix.json contains a mix of medical and general instruction data. [2025/02/22] We released the distilled dataset from Deepseek-R1 based on medical verifiable problems. You can use it to initialize your models with the reasoning chain from Deepseek-R1. [2024/12/25] We open-sourced the medical reasoning dataset for SFT, built on medical verifiable problems and an LLM… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT.

  8. h

    MME-Reasoning

    • huggingface.co
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alpha-Innovator Lab (2025). MME-Reasoning [Dataset]. https://huggingface.co/datasets/U4R/MME-Reasoning
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Alpha-Innovator Lab
    Description

    MME-Reasoning 🔥: A Comprehensive Benchmark for Logical Reasoning in MLLMs

    Official repository for "MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs". 🌟 For more details, please refer to the project page. [🚀Project Page] [📖 Paper] [🗃️ Github] [🏆 Leaderboard]

      💥 News
    

    [2025.05.23] 🔥 We launch MME-Reasoning, a comprehensive benchmark designed to evaluate the reasoning ability of MLLMs. We release the arxiv paper and all data samples… See the full description on the dataset page: https://huggingface.co/datasets/U4R/MME-Reasoning.

  9. h

    synthetic-reasoning-dataset-llama3-1

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KC (2025). synthetic-reasoning-dataset-llama3-1 [Dataset]. https://huggingface.co/datasets/Rhushya/synthetic-reasoning-dataset-llama3-1
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    KC
    Description

    Rhushya/synthetic-reasoning-dataset-llama3-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    Maze-Reasoning

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menlo Research (2025). Maze-Reasoning [Dataset]. https://huggingface.co/datasets/Menlo/Maze-Reasoning
    Explore at:
    Dataset updated
    Feb 7, 2025
    Dataset authored and provided by
    Menlo Research
    Description

    Menlo/Maze-Reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    Multimodal-Visual-Reasoning-Dataset

    • huggingface.co
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chengliu (2025). Multimodal-Visual-Reasoning-Dataset [Dataset]. https://huggingface.co/datasets/lccccc-1/Multimodal-Visual-Reasoning-Dataset
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    chengliu
    Description

    lccccc-1/Multimodal-Visual-Reasoning-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    python-reasoning-dataset

    • huggingface.co
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Han Díaz (2025). python-reasoning-dataset [Dataset]. https://huggingface.co/datasets/sdiazlor/python-reasoning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2025
    Authors
    Sara Han Díaz
    Description

    Dataset Card for my-distiset-986461

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/my-distiset-986461/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/sdiazlor/python-reasoning-dataset.

  13. h

    CodeIO-PyEdu-Reasoning

    • huggingface.co
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HKUST NLP Group (2025). CodeIO-PyEdu-Reasoning [Dataset]. https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    HKUST NLP Group
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

    📑 Paper  |  🌐 Project Page  |  💾 Released Resources  |  📦 Repo 
    

    This is the resource page of the CodeI/O collection on Huggingface, we highlight your currect position with a blue block. Dataset

      Dataset
      Link
    
    
      CodeI/O-PythonEdu-Reasoning
    
       🤗
    

    Please also check the raw data after our processing if you are interested:… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning.

  14. h

    Data from: visual-spatial-reasoning

    • huggingface.co
    • opendatalab.com
    Updated Oct 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julen Etxaniz (2023). visual-spatial-reasoning [Dataset]. https://huggingface.co/datasets/juletxara/visual-spatial-reasoning
    Explore at:
    Dataset updated
    Oct 6, 2023
    Authors
    Julen Etxaniz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Visual Spatial Reasoning (VSR) corpus is a collection of caption-image pairs with true/false labels. Each caption describes the spatial relation of two individual objects in the image, and a vision-language model (VLM) needs to judge whether the caption is correctly describing the image (True) or not (False).

  15. h

    reasoning

    • huggingface.co
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). reasoning [Dataset]. https://huggingface.co/datasets/livebench/reasoning
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/reasoning"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/reasoning.

  16. reasoning-mix

    • huggingface.co
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EleutherAI (2025). reasoning-mix [Dataset]. https://huggingface.co/datasets/EleutherAI/reasoning-mix
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2025
    Dataset authored and provided by
    EleutherAIhttps://eleuther.ai/
    Description

    Shuffled mix of:

    Large high dataset of quality web text: https://huggingface.co/datasets/EleutherAI/fineweb-edu-dedup-10b Medium dataset of QwQ math reasoning: https://huggingface.co/datasets/PrimeIntellect/NuminaMath-QwQ-CoT-5M Small dataset of DeepSeek-R1 reasoning traces on math, coding, science and puzzle data: https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k

    Intended for disentanglement of advanced reasoning models (SAEs, transcoders). Generation code:… See the full description on the dataset page: https://huggingface.co/datasets/EleutherAI/reasoning-mix.

  17. h

    synthetic_reasoning

    • huggingface.co
    Updated Oct 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unlimited Research Group of AI (2023). synthetic_reasoning [Dataset]. https://huggingface.co/datasets/ura-hcmut/synthetic_reasoning
    Explore at:
    Dataset updated
    Oct 27, 2023
    Dataset authored and provided by
    Unlimited Research Group of AI
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description
  18. h

    OpenThoughts-114k

    • huggingface.co
    Updated Jan 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Thoughts (2025). OpenThoughts-114k [Dataset]. https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k
    Explore at:
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Open Thoughts
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    [!NOTE] We have released a paper for OpenThoughts! See our paper here.

      Open-Thoughts-114k
    

    Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles! Inspect the content with rich formatting with Curator Viewer.

      Available Subsets
    

    default subset containing ready-to-train data used to finetune the OpenThinker-7B and OpenThinker-32B models: ds = load_dataset("open-thoughts/OpenThoughts-114k", split="train")… See the full description on the dataset page: https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k.

  19. h

    MedCaseReasoning

    • huggingface.co
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zou Lab @ Stanford University (2025). MedCaseReasoning [Dataset]. https://huggingface.co/datasets/zou-lab/MedCaseReasoning
    Explore at:
    Dataset updated
    Sep 25, 2025
    Dataset authored and provided by
    Zou Lab @ Stanford University
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    zou-lab/MedCaseReasoning dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    PhysReason

    • huggingface.co
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibei (2025). PhysReason [Dataset]. https://huggingface.co/datasets/zhibei1204/PhysReason
    Explore at:
    Dataset updated
    May 30, 2025
    Authors
    Zhibei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

    PhysReason is accepted by ACL-2025-main

      📋 Overview
    

    PhysReason is a comprehensive physics-based reasoning benchmark consisting of 1,200 physics problems spanning multiple domains, with a focus on both knowledge-based (25%) and reasoning-based (75%) questions. This benchmark addresses the critical gap in evaluating large language models' capabilities in physics-based reasoning, which requires… See the full description on the dataset page: https://huggingface.co/datasets/zhibei1204/PhysReason.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Vila (2024). reasoning [Dataset]. https://huggingface.co/datasets/dvilasuero/reasoning

reasoning

dvilasuero/reasoning

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2024
Authors
Daniel Vila
Description

Dataset Card for reasoning

This dataset has been created with distilabel.

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/reasoning/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/reasoning.

Search
Clear search
Close search
Google apps
Main menu