30 datasets found

openmath-2-math
huggingface.co
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI2 Adapt Dev (2024). openmath-2-math [Dataset]. https://huggingface.co/datasets/ai2-adapt-dev/openmath-2-math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
AI2 Adapt Dev
Description
ai2-adapt-dev/openmath-2-math dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenMathReasoning
huggingface.co
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). OpenMathReasoning [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathReasoning
Explore at:
Dataset updated
Apr 23, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenMathReasoning

OpenMathReasoning is a large-scale math reasoning dataset for training large language models (LLMs). This dataset contains

306K unique mathematical problems sourced from AoPS forums with: 3.2M long chain-of-thought (CoT) solutions 1.7M long tool-integrated reasoning (TIR) solutions 566K samples that select the most promising solution out of many candidates (GenSelect)

Additional 193K problems sourced from AoPS forums (problems only, no solutions)

We used… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathReasoning.
h
OpenR1-Math-220k
huggingface.co
kaggle.com
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenR1-Math-220k [Dataset]. https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Dataset authored and provided by
Open R1
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
OpenR1-Math-220k

Dataset description

OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.
OpenMath-MATH-masked
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). OpenMath-MATH-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMath GSM8K Masked

We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.
OpenMathInstruct-1
huggingface.co
opendatalab.com
+1more
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.
OpenMath-GSM8K-masked
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). OpenMath-GSM8K-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMath GSM8K Masked

We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.
h
openmath-filtered
huggingface.co
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McGill NLP Group (2025). openmath-filtered [Dataset]. https://huggingface.co/datasets/McGill-NLP/openmath-filtered
Explore at:
Dataset updated
Oct 9, 2025
Dataset authored and provided by
McGill NLP Group
Description
McGill-NLP/openmath-filtered dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openmath-20k
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zaaayy, openmath-20k [Dataset]. https://huggingface.co/datasets/zay25/openmath-20k
Explore at:
Authors
zaaayy
Description
zay25/openmath-20k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-math-reasoning
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SM Shah (2025). open-math-reasoning [Dataset]. https://huggingface.co/datasets/SMSHAH/open-math-reasoning
Explore at:
Dataset updated
May 28, 2025
Authors
SM Shah
Description
SMSHAH/open-math-reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sft-openmath-eval
huggingface.co
Updated Apr 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seonho Yeom (2025). sft-openmath-eval [Dataset]. https://huggingface.co/datasets/Seono/sft-openmath-eval
Explore at:
Dataset updated
Apr 16, 2025
Authors
Seonho Yeom
Description
SFT Format Dataset

Overview

This dataset is converted to SFT (Supervised Fine-Tuning) format. It was created by transforming OpenMathInstruct and Stanford Human Preferences (SHP) datasets.

Dataset Structure

Each entry follows this format: Instruction: [Problem, question, or conversation history] Response: [Solution, answer, or response]

Usage Guide Loading the Dataset

from datasets import load_dataset

Load datasets from Hugging Face… See the full description on the dataset page: https://huggingface.co/datasets/Seono/sft-openmath-eval.
h
openmath-reasoning-hard-extracted-solution
huggingface.co
Updated Nov 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Yang (2025). openmath-reasoning-hard-extracted-solution [Dataset]. https://huggingface.co/datasets/d1shs0ap/openmath-reasoning-hard-extracted-solution
Explore at:
Dataset updated
Nov 14, 2025
Authors
Matthew Yang
Description
d1shs0ap/openmath-reasoning-hard-extracted-solution dataset hosted on Hugging Face and contributed by the HF Datasets community
h
tulu-gsm8k-openmath-instruct-100k
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John M., tulu-gsm8k-openmath-instruct-100k [Dataset]. https://huggingface.co/datasets/ketchup123/tulu-gsm8k-openmath-instruct-100k
Explore at:
Authors
John M.
Description
ketchup123/tulu-gsm8k-openmath-instruct-100k dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenMathInstruct-2
huggingface.co
Updated Oct 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
h
OpenMath-Difficulty-Annotated
huggingface.co
Updated Oct 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HAD (2024). OpenMath-Difficulty-Annotated [Dataset]. https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated
Explore at:
Dataset updated
Oct 26, 2024
Authors
HAD
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
📐 OpenMath-Difficulty-Annotated

🚀 Overview

OpenMath-Difficulty-Annotated is a curated subset of OpenMathInstruct-2 containing 10,176 math problems, enhanced with precise difficulty metadata. While the original solutions are preserved from NVIDIA's dataset, we employed a 120B Parameter Model (LLM-as-a-Judge) to analyze and grade every single problem on a scale of 1 to 5. This allows developers of Small Language Models (1B-3B) to filter out "Olympiad-level" noise… See the full description on the dataset page: https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated.
h
nvidia-openmath-phi-format-unbalanced
huggingface.co
Updated Dec 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yehya (2025). nvidia-openmath-phi-format-unbalanced [Dataset]. https://huggingface.co/datasets/ykarout/nvidia-openmath-phi-format-unbalanced
Explore at:
Dataset updated
Dec 1, 2025
Authors
yehya
Description
ykarout/nvidia-openmath-phi-format-unbalanced dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sft-openmath-train
huggingface.co
Updated Apr 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seonho Yeom (2025). sft-openmath-train [Dataset]. https://huggingface.co/datasets/Seono/sft-openmath-train
Explore at:
Dataset updated
Apr 19, 2025
Authors
Seonho Yeom
Description
SFT Format Dataset

Overview

This dataset is converted to SFT (Supervised Fine-Tuning) format. It was created by transforming OpenMathInstruct and Stanford Human Preferences (SHP) datasets.

Dataset Structure

Each entry follows this format: Instruction: [Problem, question, or conversation history] Response: [Solution, answer, or response]

Usage Guide Loading the Dataset

from datasets import load_dataset

Load datasets from Hugging Face… See the full description on the dataset page: https://huggingface.co/datasets/Seono/sft-openmath-train.
h
sft-dedup-openmath-eval
huggingface.co
Updated Apr 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seonho Yeom (2025). sft-dedup-openmath-eval [Dataset]. https://huggingface.co/datasets/Seono/sft-dedup-openmath-eval
Explore at:
Dataset updated
Apr 16, 2025
Authors
Seonho Yeom
Description
Seono/sft-dedup-openmath-eval dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMath-Nemotron-7B-AIME25
huggingface.co
Updated Aug 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mika Senghaas (2025). OpenMath-Nemotron-7B-AIME25 [Dataset]. https://huggingface.co/datasets/mikasenghaas/OpenMath-Nemotron-7B-AIME25
Explore at:
Dataset updated
Aug 31, 2025
Authors
Mika Senghaas
Description
mikasenghaas/OpenMath-Nemotron-7B-AIME25 dataset hosted on Hugging Face and contributed by the HF Datasets community
openmath-2-gsm8k
huggingface.co
Updated Feb 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI2 Adapt Dev (2025). openmath-2-gsm8k [Dataset]. https://huggingface.co/datasets/ai2-adapt-dev/openmath-2-gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
AI2 Adapt Dev
Description
ai2-adapt-dev/openmath-2-gsm8k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathReasoning-mini
huggingface.co
Updated May 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unsloth AI (2025). OpenMathReasoning-mini [Dataset]. https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini
Explore at:
Dataset updated
May 2, 2025
Dataset authored and provided by
Unsloth AI
Description
unsloth/OpenMathReasoning-mini dataset hosted on Hugging Face and contributed by the HF Datasets community