14 datasets found
  1. OpenMathInstruct-1

    • huggingface.co
    • opendatalab.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1

    OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

  2. h

    OpenMathInstruct-1-1.8m-ja

    • huggingface.co
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kunishou (2025). OpenMathInstruct-1-1.8m-ja [Dataset]. https://huggingface.co/datasets/kunishou/OpenMathInstruct-1-1.8m-ja
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2025
    Authors
    kunishou
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1 を日本語に自動翻訳した商用利用可能な180万件の指示チューニングデータセットになります。
    OpenMathInstruct-1 は、GSM8K および MATH ベンチマーク トレーニングセットの question と Mixtral-8x7B モデルを使用して生成された solution のペアで構成される数学分野のデータセットです。solution は合成データですが GSM8K および MATH の解答と solution から導出した値が等しくなることを確認することで誤った solution を除外しています。データセットの詳細は論文をご覧下さい。 このデータセットの使用は、商用利用を許可するNVIDIA ライセンスによって管理されます。本データセットを再配布する場合は当該ライセンスを継承する必要があります。また、モデル学習に使用した場合は、モデルのライセンスは当該ライセンスに従う必要はないという認識です( Nvidia の OpenMath-Mistral や OpenMath-CodeLlama も apache 2.0… See the full description on the dataset page: https://huggingface.co/datasets/kunishou/OpenMathInstruct-1-1.8m-ja.

  3. h

    Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2024). Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Description

    5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    OpenMathInstruct-1-Parsed

    • huggingface.co
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Kilbas (2024). OpenMathInstruct-1-Parsed [Dataset]. https://huggingface.co/datasets/kaleinaNyan/OpenMathInstruct-1-Parsed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2024
    Authors
    Igor Kilbas
    Description

    This is a parsed subset of the OpenMathInstruct-1 dataset. Each sample in the dataset is formatted in the following way: {"messages": [ {"role": "user", "content": "user prompt"}, {"role": "assistant", "content": "assistant response", "tool_call": "python code"}, {"role": "tool", "content": "code execution result"}, {"role": "assistant","content": "assistant response"}, ]}

    The dataset is split into two files:

    open-math-short.jsonl - the assistant only makes a single function call.… See the full description on the dataset page: https://huggingface.co/datasets/kaleinaNyan/OpenMathInstruct-1-Parsed.

  5. h

    OpenMathInstruct-1-correct

    • huggingface.co
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengyi Costa Huang (2024). OpenMathInstruct-1-correct [Dataset]. https://huggingface.co/datasets/vwxyzjn/OpenMathInstruct-1-correct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Authors
    Shengyi Costa Huang
    Description

    vwxyzjn/OpenMathInstruct-1-correct dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    OpenMathInstruct-1

    • huggingface.co
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samo line (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/samoline/OpenMathInstruct-1
    Explore at:
    Dataset updated
    Feb 16, 2024
    Authors
    samo line
    Description

    samoline/OpenMathInstruct-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. OpenMathInstruct-2

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathInstruct-2

    OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

    Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

  8. h

    OpenMathInstruct-1-1000-processed

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Cedric R. Idia, OpenMathInstruct-1-1000-processed [Dataset]. https://huggingface.co/datasets/marcuscedricridia/OpenMathInstruct-1-1000-processed
    Explore at:
    Authors
    Marcus Cedric R. Idia
    Description

    marcuscedricridia/OpenMathInstruct-1-1000-processed dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    OpenMathInstruct-1-1.8m-ja-askllm-v1

    • huggingface.co
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Team Kuma (2024). OpenMathInstruct-1-1.8m-ja-askllm-v1 [Dataset]. https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset authored and provided by
    Team Kuma
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1-1.8m-ja-askllm-v1

    データセット kunishou/OpenMathInstruct-1-1.8m-ja に対して、 Ask-LLM 手法でスコア付けしたデータセットです。 元データセットのカラムに加え askllm_score というカラムが追加されており、ここに Ask-LLM のスコアが格納されています。 Ask-LLM でスコア付けに使用した LLM は Rakuten/RakutenAI-7B-instruct で、プロンプトは以下の通りです。 ### {data} ###

    Does the previous paragraph demarcated within ### and ### contain informative signal for pre-training a large-language model? An informative datapoint should be well-formatted, contain some usable knowledge of the… See the full description on the dataset page: https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1.

  10. h

    OpenMathInstruct-2-trl

    • huggingface.co
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Martín Blázquez (2025). OpenMathInstruct-2-trl [Dataset]. https://huggingface.co/datasets/gabrielmbmb/OpenMathInstruct-2-trl
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Gabriel Martín Blázquez
    Description

    gabrielmbmb/OpenMathInstruct-2-trl dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. OpenMath-MATH-masked

    • huggingface.co
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-MATH-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.

  12. OpenMath-GSM8K-masked

    • huggingface.co
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). OpenMath-GSM8K-masked [Dataset]. https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMath GSM8K Masked

    We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.

  13. h

    OpenMathInstruct-1-1.8m-ja_10k

    • huggingface.co
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GENIAC Team Ozaki (2024). OpenMathInstruct-1-1.8m-ja_10k [Dataset]. https://huggingface.co/datasets/GENIAC-Team-Ozaki/OpenMathInstruct-1-1.8m-ja_10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Dataset authored and provided by
    GENIAC Team Ozaki
    Description

    GENIAC-Team-Ozaki/OpenMathInstruct-1-1.8m-ja_10k dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    OpenMathInstruct-ja-phi3

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kan hatakeyama, OpenMathInstruct-ja-phi3 [Dataset]. https://huggingface.co/datasets/kanhatakeyama/OpenMathInstruct-ja-phi3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    kan hatakeyama
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    kunishou/OpenMathInstruct-1-1.8m-jaをPhi-3で解き直したものです。 実際にPython scriptを実行し、答えがあっていたもののみをフィルタリングして掲載しています。

    生成コード

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
Organization logo

OpenMathInstruct-1

OpenMathInstruct-1

nvidia/OpenMathInstruct-1

Explore at:
129 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

OpenMathInstruct-1

OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

Search
Clear search
Close search
Google apps
Main menu