30 datasets found
  1. openmath-2-math

    • huggingface.co
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI2 Adapt Dev (2024). openmath-2-math [Dataset]. https://huggingface.co/datasets/ai2-adapt-dev/openmath-2-math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    AI2 Adapt Dev
    Description

    ai2-adapt-dev/openmath-2-math dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. OpenMathReasoning

    • huggingface.co
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMathReasoning [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathReasoning
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathReasoning

    OpenMathReasoning is a large-scale math reasoning dataset for training large language models (LLMs). This dataset contains

    306K unique mathematical problems sourced from AoPS forums with: 3.2M long chain-of-thought (CoT) solutions 1.7M long tool-integrated reasoning (TIR) solutions 566K samples that select the most promising solution out of many candidates (GenSelect)

    Additional 193K problems sourced from AoPS forums (problems only, no solutions)

    We used… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathReasoning.

  3. h

    OpenR1-Math-220k

    • huggingface.co
    • kaggle.com
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open R1 (2025). OpenR1-Math-220k [Dataset]. https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Open R1
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    OpenR1-Math-220k

      Dataset description
    

    OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.

  4. OpenMath-MATH-masked

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-MATH-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.

  5. OpenMathInstruct-1

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1

    OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

  6. OpenMath-GSM8K-masked

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-GSM8K-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.

  7. h

    openmath-filtered

    • huggingface.co
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McGill NLP Group (2025). openmath-filtered [Dataset]. https://huggingface.co/datasets/McGill-NLP/openmath-filtered
    Explore at:
    Dataset updated
    Oct 9, 2025
    Dataset authored and provided by
    McGill NLP Group
    Description

    McGill-NLP/openmath-filtered dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    openmath-20k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zaaayy, openmath-20k [Dataset]. https://huggingface.co/datasets/zay25/openmath-20k
    Explore at:
    Authors
    zaaayy
    Description

    zay25/openmath-20k dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    open-math-reasoning

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SM Shah (2025). open-math-reasoning [Dataset]. https://huggingface.co/datasets/SMSHAH/open-math-reasoning
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    SM Shah
    Description

    SMSHAH/open-math-reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    sft-openmath-eval

    • huggingface.co
    Updated Apr 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seonho Yeom (2025). sft-openmath-eval [Dataset]. https://huggingface.co/datasets/Seono/sft-openmath-eval
    Explore at:
    Dataset updated
    Apr 16, 2025
    Authors
    Seonho Yeom
    Description

    SFT Format Dataset

      Overview
    

    This dataset is converted to SFT (Supervised Fine-Tuning) format. It was created by transforming OpenMathInstruct and Stanford Human Preferences (SHP) datasets.

      Dataset Structure
    

    Each entry follows this format: Instruction: [Problem, question, or conversation history] Response: [Solution, answer, or response]

      Usage Guide
    
    
    
    
    
      Loading the Dataset
    

    from datasets import load_dataset

    Load datasets from Hugging Face… See the full description on the dataset page: https://huggingface.co/datasets/Seono/sft-openmath-eval.

  11. h

    openmath-reasoning-hard-extracted-solution

    • huggingface.co
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Yang (2025). openmath-reasoning-hard-extracted-solution [Dataset]. https://huggingface.co/datasets/d1shs0ap/openmath-reasoning-hard-extracted-solution
    Explore at:
    Dataset updated
    Nov 14, 2025
    Authors
    Matthew Yang
    Description

    d1shs0ap/openmath-reasoning-hard-extracted-solution dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    tulu-gsm8k-openmath-instruct-100k

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John M., tulu-gsm8k-openmath-instruct-100k [Dataset]. https://huggingface.co/datasets/ketchup123/tulu-gsm8k-openmath-instruct-100k
    Explore at:
    Authors
    John M.
    Description

    ketchup123/tulu-gsm8k-openmath-instruct-100k dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. OpenMathInstruct-2

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathInstruct-2

    OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

    Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

  14. h

    OpenMath-Difficulty-Annotated

    • huggingface.co
    Updated Oct 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HAD (2024). OpenMath-Difficulty-Annotated [Dataset]. https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated
    Explore at:
    Dataset updated
    Oct 26, 2024
    Authors
    HAD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    📐 OpenMath-Difficulty-Annotated

      🚀 Overview
    

    OpenMath-Difficulty-Annotated is a curated subset of OpenMathInstruct-2 containing 10,176 math problems, enhanced with precise difficulty metadata. While the original solutions are preserved from NVIDIA's dataset, we employed a 120B Parameter Model (LLM-as-a-Judge) to analyze and grade every single problem on a scale of 1 to 5. This allows developers of Small Language Models (1B-3B) to filter out "Olympiad-level" noise… See the full description on the dataset page: https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated.

  15. h

    nvidia-openmath-phi-format-unbalanced

    • huggingface.co
    Updated Dec 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yehya (2025). nvidia-openmath-phi-format-unbalanced [Dataset]. https://huggingface.co/datasets/ykarout/nvidia-openmath-phi-format-unbalanced
    Explore at:
    Dataset updated
    Dec 1, 2025
    Authors
    yehya
    Description

    ykarout/nvidia-openmath-phi-format-unbalanced dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    sft-openmath-train

    • huggingface.co
    Updated Apr 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seonho Yeom (2025). sft-openmath-train [Dataset]. https://huggingface.co/datasets/Seono/sft-openmath-train
    Explore at:
    Dataset updated
    Apr 19, 2025
    Authors
    Seonho Yeom
    Description

    SFT Format Dataset

      Overview
    

    This dataset is converted to SFT (Supervised Fine-Tuning) format. It was created by transforming OpenMathInstruct and Stanford Human Preferences (SHP) datasets.

      Dataset Structure
    

    Each entry follows this format: Instruction: [Problem, question, or conversation history] Response: [Solution, answer, or response]

      Usage Guide
    
    
    
    
    
      Loading the Dataset
    

    from datasets import load_dataset

    Load datasets from Hugging Face… See the full description on the dataset page: https://huggingface.co/datasets/Seono/sft-openmath-train.

  17. h

    sft-dedup-openmath-eval

    • huggingface.co
    Updated Apr 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seonho Yeom (2025). sft-dedup-openmath-eval [Dataset]. https://huggingface.co/datasets/Seono/sft-dedup-openmath-eval
    Explore at:
    Dataset updated
    Apr 16, 2025
    Authors
    Seonho Yeom
    Description

    Seono/sft-dedup-openmath-eval dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    OpenMath-Nemotron-7B-AIME25

    • huggingface.co
    Updated Aug 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mika Senghaas (2025). OpenMath-Nemotron-7B-AIME25 [Dataset]. https://huggingface.co/datasets/mikasenghaas/OpenMath-Nemotron-7B-AIME25
    Explore at:
    Dataset updated
    Aug 31, 2025
    Authors
    Mika Senghaas
    Description

    mikasenghaas/OpenMath-Nemotron-7B-AIME25 dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. openmath-2-gsm8k

    • huggingface.co
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI2 Adapt Dev (2025). openmath-2-gsm8k [Dataset]. https://huggingface.co/datasets/ai2-adapt-dev/openmath-2-gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    AI2 Adapt Dev
    Description

    ai2-adapt-dev/openmath-2-gsm8k dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    OpenMathReasoning-mini

    • huggingface.co
    Updated May 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unsloth AI (2025). OpenMathReasoning-mini [Dataset]. https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Unsloth AI
    Description

    unsloth/OpenMathReasoning-mini dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AI2 Adapt Dev (2024). openmath-2-math [Dataset]. https://huggingface.co/datasets/ai2-adapt-dev/openmath-2-math
Organization logo

openmath-2-math

ai2-adapt-dev/openmath-2-math

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
AI2 Adapt Dev
Description

ai2-adapt-dev/openmath-2-math dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu