22 datasets found

OpenMathInstruct-2
huggingface.co
Updated Oct 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
h
nvidia-OpenMathInstruct-2
huggingface.co
Updated Nov 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yurun Yuan (2024). nvidia-OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/RyanYr/nvidia-OpenMathInstruct-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2024
Authors
Yurun Yuan
Description
RyanYr/nvidia-OpenMathInstruct-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-10k
huggingface.co
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arthur LAGACHERIE (2024). OpenMathInstruct-2-10k [Dataset]. https://huggingface.co/datasets/Arthur-LAGACHERIE/OpenMathInstruct-2-10k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2024
Authors
Arthur LAGACHERIE
Description
Arthur-LAGACHERIE/OpenMathInstruct-2-10k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-augmented-math
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JINIAC_competition (2025). OpenMathInstruct-2-augmented-math [Dataset]. https://huggingface.co/datasets/JINIAC-competition/OpenMathInstruct-2-augmented-math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Dataset authored and provided by
JINIAC_competition
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
nvidia/OpenMathInstruct-2のaugmented_mathだけ抜き出したものです．licenseは元データと同じです．
h
OpenMathInstruct-2-Text
huggingface.co
Updated Sep 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zayd Muhammad Kawakibi Zuhri (2025). OpenMathInstruct-2-Text [Dataset]. https://huggingface.co/datasets/zaydzuhri/OpenMathInstruct-2-Text
Explore at:
Dataset updated
Sep 15, 2025
Authors
Zayd Muhammad Kawakibi Zuhri
Description
zaydzuhri/OpenMathInstruct-2-Text dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-2M-formatted
huggingface.co
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Minixhofer (2025). OpenMathInstruct-2-2M-formatted [Dataset]. https://huggingface.co/datasets/benjamin/OpenMathInstruct-2-2M-formatted
Explore at:
Dataset updated
Apr 25, 2025
Authors
Benjamin Minixhofer
Description
benjamin/OpenMathInstruct-2-2M-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-MATH-Questions
huggingface.co
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo (2025). OpenMathInstruct-2-MATH-Questions [Dataset]. https://huggingface.co/datasets/ricdomolm/OpenMathInstruct-2-MATH-Questions
Explore at:
Dataset updated
Jun 12, 2025
Authors
Ricardo
Description
import datasets

dataset = datasets.load_from_disk('/fast/rolmedo/datasets/OpenMathInstruct-2')

filter out problems that are not from augmented_math or math

problem_sources = dataset['problem_source'] keep_ids = [i for i, source in enumerate(problem_sources) if source in ['augmented_math', 'math']] dataset = dataset.select(keep_ids)

remove duplicate problems...

to_keep = [] problems_seen = set() for i, problem in enumerate(dataset['problem']): if problem not in problems_seen:… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMathInstruct-2-MATH-Questions.
h
OpenMathInstruct-2-QwQ
huggingface.co
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erdene-Ochir Tuguldur (2025). OpenMathInstruct-2-QwQ [Dataset]. https://huggingface.co/datasets/tugstugi/OpenMathInstruct-2-QwQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2025
Authors
Erdene-Ochir Tuguldur
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
OpenMathInstruct-2-QwQ

Qwen/QwQ-32B-Preview solutions of 107k augmented_math problems of nvidia/OpenMathInstruct-2. All solutions are validated and agree with the expected_answer field of OpenMathInstruct-2.
h
OpenMathInstruct-2-1M
huggingface.co
Updated Oct 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
monology (2024). OpenMathInstruct-2-1M [Dataset]. https://huggingface.co/datasets/monology/OpenMathInstruct-2-1M
Explore at:
Dataset updated
Oct 26, 2024
Authors
monology
Description
monology/OpenMathInstruct-2-1M dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-filtered-shard3
huggingface.co
Updated Oct 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tlrm (2024). OpenMathInstruct-2-filtered-shard3 [Dataset]. https://huggingface.co/datasets/tlrm/OpenMathInstruct-2-filtered-shard3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 26, 2024
Dataset authored and provided by
tlrm
Description
tlrm/OpenMathInstruct-2-filtered-shard3 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-CoT-JA
huggingface.co
Updated Dec 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hajime Yagihara (2024). OpenMathInstruct-2-CoT-JA [Dataset]. https://huggingface.co/datasets/HachiML/OpenMathInstruct-2-CoT-JA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2024
Authors
Hajime Yagihara
Description
HachiML/OpenMathInstruct-2-CoT-JA dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMAthInstruct-2-AUGMATH-Deduped
huggingface.co
Updated Sep 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo (2025). OpenMAthInstruct-2-AUGMATH-Deduped [Dataset]. https://huggingface.co/datasets/ricdomolm/OpenMAthInstruct-2-AUGMATH-Deduped
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2025
Authors
Ricardo
Description
import datasets import pandas as pd

dataset = datasets.load_dataset('nvidia/OpenMathInstruct-2', split='train_1M') dataset = dataset.filter(lambda x: x["problem_source"] == "augmented_math") dataset = dataset.remove_columns(["generated_solution", "problem_source"])

df = dataset.to_pandas() df = df.drop_duplicates(subset=["problem"]) dataset = datasets.Dataset.from_pandas(df)

dataset = dataset.rename_column("expected_answer", "answer")… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMAthInstruct-2-AUGMATH-Deduped.
h
OpenMathInstruct-2-chat
huggingface.co
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
samo line (2025). OpenMathInstruct-2-chat [Dataset]. https://huggingface.co/datasets/samoline/OpenMathInstruct-2-chat
Explore at:
Dataset updated
Nov 5, 2025
Authors
samo line
Description
samoline/OpenMathInstruct-2-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenMathInstruct-1
huggingface.co
opendatalab.com
+1more
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.
h
openmathinstruct2_10000
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junxia Cui (2025). openmathinstruct2_10000 [Dataset]. https://huggingface.co/datasets/autoprogrammer/openmathinstruct2_10000
Explore at:
Dataset updated
May 11, 2025
Authors
Junxia Cui
Description
tags: - math license: apache-2.0 --- # OpenMathInstruct2 10000 这是从 nvidia/OpenMathInstruct-2 抽取的 10000 条样本。
h
OpenMathInstruct-2-ja-CoT
huggingface.co
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kendamarron (2024). OpenMathInstruct-2-ja-CoT [Dataset]. https://huggingface.co/datasets/Kendamarron/OpenMathInstruct-2-ja-CoT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 10, 2024
Authors
Kendamarron
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Kendamarron/OpenMathInstruct-2-ja-CoT dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openmathinstruct2-ex25000-seed5
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni De Muri, openmathinstruct2-ex25000-seed5 [Dataset]. https://huggingface.co/datasets/giovannidemuri/openmathinstruct2-ex25000-seed5
Explore at:
Authors
Giovanni De Muri
Description
giovannidemuri/openmathinstruct2-ex25000-seed5 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMath-Difficulty-Annotated
huggingface.co
Updated Oct 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HAD (2024). OpenMath-Difficulty-Annotated [Dataset]. https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated
Explore at:
Dataset updated
Oct 26, 2024
Authors
HAD
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
📐 OpenMath-Difficulty-Annotated

🚀 Overview

OpenMath-Difficulty-Annotated is a curated subset of OpenMathInstruct-2 containing 10,176 math problems, enhanced with precise difficulty metadata. While the original solutions are preserved from NVIDIA's dataset, we employed a 120B Parameter Model (LLM-as-a-Judge) to analyze and grade every single problem on a scale of 1 to 5. This allows developers of Small Language Models (1B-3B) to filter out "Olympiad-level" noise… See the full description on the dataset page: https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated.
h
open-math-instruct-2-en-ja-cot
huggingface.co
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F.Y. (2024). open-math-instruct-2-en-ja-cot [Dataset]. https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot
Explore at:
Dataset updated
Dec 9, 2024
Authors
F.Y.
Description
データセット情報

このデータセットは、Kendamarron/OpenMathInstruct-2-ja-CoTを、英語と日本語のペアを含めるよう、日本語の応答データを英語の応答データに翻訳しました。翻訳後のデータをllm-as-a-judgeで翻訳品質を1〜5段階評価し、最高評価5のみをfilterして取得したものです。その後、訓練データセットとテストデータセットに9:1で分割しました。

引用データセット情報

"Kendamarron/OpenMathInstruct-2-ja-CoT": 訓練データセットのみ 15282件
Dataset: Dataset({ features: ['problem', 'generated_solution', 'expected_answer', 'problem_source', 'problem_ja', 'thought', 'output', 'evaluation', 'cot_output', 'system']… See the full description on the dataset page: https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot.
h
math-sft
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duong Hoang Le (2025). math-sft [Dataset]. https://huggingface.co/datasets/lehduong/math-sft
Explore at:
Dataset updated
May 11, 2025
Authors
Duong Hoang Le
Description
concat: OpenMathInstruct-2, OpenMathReasoning, AceMath, OpenR1-Math, Numinamath-CoT, Numinamath 1.5, OpenThoughts2-1M, MetaMathQA, Maths-College

Facebook

Twitter

Click to copy link

Link copied

Cite

NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2

OpenMathInstruct-2

nvidia/OpenMathInstruct-2

Explore at:

116 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 3, 2024

Dataset provided by

Nvidiahttp://nvidia.com/

Authors

NVIDIA

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

Clear search

Close search

Google apps

Main menu

OpenMathInstruct-2

nvidia-OpenMathInstruct-2

OpenMathInstruct-2-10k

OpenMathInstruct-2-augmented-math

OpenMathInstruct-2-Text

OpenMathInstruct-2-2M-formatted

OpenMathInstruct-2-MATH-Questions

filter out problems that are not from augmented_math or math

remove duplicate problems...

OpenMathInstruct-2-QwQ

OpenMathInstruct-2-1M

OpenMathInstruct-2-filtered-shard3

OpenMathInstruct-2-CoT-JA

OpenMAthInstruct-2-AUGMATH-Deduped

OpenMathInstruct-2-chat

OpenMathInstruct-1

openmathinstruct2_10000

OpenMathInstruct-2-ja-CoT

openmathinstruct2-ex25000-seed5

OpenMath-Difficulty-Annotated

open-math-instruct-2-en-ja-cot

math-sft

OpenMathInstruct-2See More Versions

OpenMathInstruct-2

nvidia/OpenMathInstruct-2

OpenMathInstruct-2