14 datasets found

OpenMathInstruct-1
huggingface.co
opendatalab.com
+1more
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.
OpenMathInstruct-2
huggingface.co
Updated Oct 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
OpenMathInstruct-1
kaggle.com
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
张鉴鸾 (2024). OpenMathInstruct-1 [Dataset]. https://www.kaggle.com/datasets/zjp2origin/openmathinstruct-1
Explore at:
zip(2346043456 bytes)Available download formats
Dataset updated
May 31, 2024
Authors
张鉴鸾
Description
Dataset

This dataset was created by 张鉴鸾

Contents
h
OpenMathInstruct-1-1.8m-ja-askllm-v1
huggingface.co
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Team Kuma (2024). OpenMathInstruct-1-1.8m-ja-askllm-v1 [Dataset]. https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset authored and provided by
Team Kuma
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMathInstruct-1-1.8m-ja-askllm-v1

データセット kunishou/OpenMathInstruct-1-1.8m-ja に対して、 Ask-LLM 手法でスコア付けしたデータセットです。元データセットのカラムに加え askllm_score というカラムが追加されており、ここに Ask-LLM のスコアが格納されています。 Ask-LLM でスコア付けに使用した LLM は Rakuten/RakutenAI-7B-instruct で、プロンプトは以下の通りです。 ### {data} ###

Does the previous paragraph demarcated within ### and ### contain informative signal for pre-training a large-language model? An informative datapoint should be well-formatted, contain some usable knowledge of the… See the full description on the dataset page: https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1.
h
OpenMathInstruct-1-1000-processed
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcus Cedric R. Idia, OpenMathInstruct-1-1000-processed [Dataset]. https://huggingface.co/datasets/marcuscedricridia/OpenMathInstruct-1-1000-processed
Explore at:
Authors
Marcus Cedric R. Idia
Description
marcuscedricridia/OpenMathInstruct-1-1000-processed dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenMath-MATH-masked
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). OpenMath-MATH-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMath GSM8K Masked

We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.
h
OpenMathInstruct-2-sampled
huggingface.co
Updated Jan 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Martín Blázquez (2025). OpenMathInstruct-2-sampled [Dataset]. https://huggingface.co/datasets/gabrielmbmb/OpenMathInstruct-2-sampled
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2025
Authors
Gabriel Martín Blázquez
Description
gabrielmbmb/OpenMathInstruct-2-sampled dataset hosted on Hugging Face and contributed by the HF Datasets community
OpenMath-GSM8K-masked
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). OpenMath-GSM8K-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
OpenMath GSM8K Masked

We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.
h
Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2024). Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Fifth Civil Defender - 5CD
Description
5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct1-ja
huggingface.co
Updated Nov 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hayato Hongo (2025). OpenMathInstruct1-ja [Dataset]. https://huggingface.co/datasets/HayatoHongo/OpenMathInstruct1-ja
Explore at:
Dataset updated
Nov 29, 2025
Authors
Hayato Hongo
Description
HayatoHongo/OpenMathInstruct1-ja dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-2-ja-CoT
huggingface.co
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kendamarron (2024). OpenMathInstruct-2-ja-CoT [Dataset]. https://huggingface.co/datasets/Kendamarron/OpenMathInstruct-2-ja-CoT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 10, 2024
Authors
Kendamarron
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Kendamarron/OpenMathInstruct-2-ja-CoT dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenMathInstruct-ja-phi3
huggingface.co
Updated Oct 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kan hatakeyama (2025). OpenMathInstruct-ja-phi3 [Dataset]. https://huggingface.co/datasets/kanhatakeyama/OpenMathInstruct-ja-phi3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2025
Authors
kan hatakeyama
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
kunishou/OpenMathInstruct-1-1.8m-jaをPhi-3で解き直したものです｡実際にPython scriptを実行し､答えがあっていたもののみをフィルタリングして掲載しています｡

生成コード
h
open-math-instruct-2-en-ja-cot
huggingface.co
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F.Y. (2024). open-math-instruct-2-en-ja-cot [Dataset]. https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot
Explore at:
Dataset updated
Dec 9, 2024
Authors
F.Y.
Description
データセット情報

このデータセットは、Kendamarron/OpenMathInstruct-2-ja-CoTを、英語と日本語のペアを含めるよう、日本語の応答データを英語の応答データに翻訳しました。翻訳後のデータをllm-as-a-judgeで翻訳品質を1〜5段階評価し、最高評価5のみをfilterして取得したものです。その後、訓練データセットとテストデータセットに9:1で分割しました。

引用データセット情報

"Kendamarron/OpenMathInstruct-2-ja-CoT": 訓練データセットのみ 15282件
Dataset: Dataset({ features: ['problem', 'generated_solution', 'expected_answer', 'problem_source', 'problem_ja', 'thought', 'output', 'evaluation', 'cot_output', 'system']… See the full description on the dataset page: https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot.
h
SyntheticTextOpenMathInstruct
huggingface.co
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kan hatakeyama (2024). SyntheticTextOpenMathInstruct [Dataset]. https://huggingface.co/datasets/kanhatakeyama/SyntheticTextOpenMathInstruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2024
Authors
kan hatakeyama
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
以下のデータ源からランダムに抽出した日本語のテキストをもとに､Phi-3で作文したコーパスです｡ OpenMathInstruct-1-1.8m-ja

コード

こちら

一部の計算には東京工業大学のスーパーコンピュータTSUBAME4.0を利用しました｡
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1

OpenMathInstruct-1

nvidia/OpenMathInstruct-1

Explore at:

159 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 16, 2024

Dataset provided by

Nvidiahttp://nvidia.com/

Authors

NVIDIA

License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

Clear search

Close search

Google apps

Main menu

OpenMathInstruct-1

OpenMathInstruct-2

OpenMathInstruct-1

Dataset

Contents

OpenMathInstruct-1-1.8m-ja-askllm-v1

OpenMathInstruct-1-1000-processed

OpenMath-MATH-masked

OpenMathInstruct-2-sampled

OpenMath-GSM8K-masked

Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated

OpenMathInstruct1-ja

OpenMathInstruct-2-ja-CoT

OpenMathInstruct-ja-phi3

open-math-instruct-2-en-ja-cot

SyntheticTextOpenMathInstruct

OpenMathInstruct-1See More Versions

OpenMathInstruct-1

nvidia/OpenMathInstruct-1

OpenMathInstruct-1