14 datasets found
  1. OpenMathInstruct-1

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1

    OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

  2. OpenMathInstruct-2

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathInstruct-2

    OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

    Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

  3. OpenMathInstruct-1

    • kaggle.com
    zip
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    张鉴鸾 (2024). OpenMathInstruct-1 [Dataset]. https://www.kaggle.com/datasets/zjp2origin/openmathinstruct-1
    Explore at:
    zip(2346043456 bytes)Available download formats
    Dataset updated
    May 31, 2024
    Authors
    张鉴鸾
    Description

    Dataset

    This dataset was created by 张鉴鸾

    Contents

  4. h

    OpenMathInstruct-1-1.8m-ja-askllm-v1

    • huggingface.co
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Team Kuma (2024). OpenMathInstruct-1-1.8m-ja-askllm-v1 [Dataset]. https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset authored and provided by
    Team Kuma
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1-1.8m-ja-askllm-v1

    データセット kunishou/OpenMathInstruct-1-1.8m-ja に対して、 Ask-LLM 手法でスコア付けしたデータセットです。 元データセットのカラムに加え askllm_score というカラムが追加されており、ここに Ask-LLM のスコアが格納されています。 Ask-LLM でスコア付けに使用した LLM は Rakuten/RakutenAI-7B-instruct で、プロンプトは以下の通りです。 ### {data} ###

    Does the previous paragraph demarcated within ### and ### contain informative signal for pre-training a large-language model? An informative datapoint should be well-formatted, contain some usable knowledge of the… See the full description on the dataset page: https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1.

  5. h

    OpenMathInstruct-1-1000-processed

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Cedric R. Idia, OpenMathInstruct-1-1000-processed [Dataset]. https://huggingface.co/datasets/marcuscedricridia/OpenMathInstruct-1-1000-processed
    Explore at:
    Authors
    Marcus Cedric R. Idia
    Description

    marcuscedricridia/OpenMathInstruct-1-1000-processed dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. OpenMath-MATH-masked

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-MATH-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.

  7. h

    OpenMathInstruct-2-sampled

    • huggingface.co
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Martín Blázquez (2025). OpenMathInstruct-2-sampled [Dataset]. https://huggingface.co/datasets/gabrielmbmb/OpenMathInstruct-2-sampled
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Gabriel Martín Blázquez
    Description

    gabrielmbmb/OpenMathInstruct-2-sampled dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. OpenMath-GSM8K-masked

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-GSM8K-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.

  9. h

    Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2024). Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Description

    5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    OpenMathInstruct1-ja

    • huggingface.co
    Updated Nov 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hayato Hongo (2025). OpenMathInstruct1-ja [Dataset]. https://huggingface.co/datasets/HayatoHongo/OpenMathInstruct1-ja
    Explore at:
    Dataset updated
    Nov 29, 2025
    Authors
    Hayato Hongo
    Description

    HayatoHongo/OpenMathInstruct1-ja dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    OpenMathInstruct-2-ja-CoT

    • huggingface.co
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendamarron (2024). OpenMathInstruct-2-ja-CoT [Dataset]. https://huggingface.co/datasets/Kendamarron/OpenMathInstruct-2-ja-CoT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 10, 2024
    Authors
    Kendamarron
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Kendamarron/OpenMathInstruct-2-ja-CoT dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    OpenMathInstruct-ja-phi3

    • huggingface.co
    Updated Oct 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kan hatakeyama (2025). OpenMathInstruct-ja-phi3 [Dataset]. https://huggingface.co/datasets/kanhatakeyama/OpenMathInstruct-ja-phi3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2025
    Authors
    kan hatakeyama
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    kunishou/OpenMathInstruct-1-1.8m-jaをPhi-3で解き直したものです。 実際にPython scriptを実行し、答えがあっていたもののみをフィルタリングして掲載しています。

    生成コード

  13. h

    open-math-instruct-2-en-ja-cot

    • huggingface.co
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F.Y. (2024). open-math-instruct-2-en-ja-cot [Dataset]. https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot
    Explore at:
    Dataset updated
    Dec 9, 2024
    Authors
    F.Y.
    Description

    データセット情報

    このデータセットは、Kendamarron/OpenMathInstruct-2-ja-CoTを、英語と日本語のペアを含めるよう、日本語の応答データを英語の応答データに翻訳しました。 翻訳後のデータをllm-as-a-judgeで翻訳品質を1〜5段階評価し、最高評価5のみをfilterして取得したものです。 その後、訓練データセットとテストデータセットに9:1で分割しました。

      引用データセット情報
    

    "Kendamarron/OpenMathInstruct-2-ja-CoT": 訓練データセットのみ 15282件
    Dataset: Dataset({ features: ['problem', 'generated_solution', 'expected_answer', 'problem_source', 'problem_ja', 'thought', 'output', 'evaluation', 'cot_output', 'system']… See the full description on the dataset page: https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot.

  14. h

    SyntheticTextOpenMathInstruct

    • huggingface.co
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kan hatakeyama (2024). SyntheticTextOpenMathInstruct [Dataset]. https://huggingface.co/datasets/kanhatakeyama/SyntheticTextOpenMathInstruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Authors
    kan hatakeyama
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    以下のデータ源からランダムに抽出した日本語のテキストをもとに、Phi-3で作文したコーパスです。 OpenMathInstruct-1-1.8m-ja

      コード
    

    こちら

    一部の計算には東京工業大学のスーパーコンピュータTSUBAME4.0を利用しました。

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
Organization logo

OpenMathInstruct-1

OpenMathInstruct-1

nvidia/OpenMathInstruct-1

Explore at:
159 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

Search
Clear search
Close search
Google apps
Main menu