22 datasets found
  1. OpenMathInstruct-2

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathInstruct-2

    OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

    Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

  2. h

    nvidia-OpenMathInstruct-2

    • huggingface.co
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yurun Yuan (2024). nvidia-OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/RyanYr/nvidia-OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2024
    Authors
    Yurun Yuan
    Description

    RyanYr/nvidia-OpenMathInstruct-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    OpenMathInstruct-2-10k

    • huggingface.co
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur LAGACHERIE (2024). OpenMathInstruct-2-10k [Dataset]. https://huggingface.co/datasets/Arthur-LAGACHERIE/OpenMathInstruct-2-10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Authors
    Arthur LAGACHERIE
    Description

    Arthur-LAGACHERIE/OpenMathInstruct-2-10k dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    OpenMathInstruct-2-augmented-math

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JINIAC_competition (2025). OpenMathInstruct-2-augmented-math [Dataset]. https://huggingface.co/datasets/JINIAC-competition/OpenMathInstruct-2-augmented-math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    JINIAC_competition
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    nvidia/OpenMathInstruct-2のaugmented_mathだけ抜き出したものです.licenseは元データと同じです.

  5. h

    OpenMathInstruct-2-Text

    • huggingface.co
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zayd Muhammad Kawakibi Zuhri (2025). OpenMathInstruct-2-Text [Dataset]. https://huggingface.co/datasets/zaydzuhri/OpenMathInstruct-2-Text
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Zayd Muhammad Kawakibi Zuhri
    Description

    zaydzuhri/OpenMathInstruct-2-Text dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    OpenMathInstruct-2-2M-formatted

    • huggingface.co
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Minixhofer (2025). OpenMathInstruct-2-2M-formatted [Dataset]. https://huggingface.co/datasets/benjamin/OpenMathInstruct-2-2M-formatted
    Explore at:
    Dataset updated
    Apr 25, 2025
    Authors
    Benjamin Minixhofer
    Description

    benjamin/OpenMathInstruct-2-2M-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    OpenMathInstruct-2-MATH-Questions

    • huggingface.co
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo (2025). OpenMathInstruct-2-MATH-Questions [Dataset]. https://huggingface.co/datasets/ricdomolm/OpenMathInstruct-2-MATH-Questions
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    Ricardo
    Description

    import datasets

    dataset = datasets.load_from_disk('/fast/rolmedo/datasets/OpenMathInstruct-2')

    filter out problems that are not from augmented_math or math

    problem_sources = dataset['problem_source'] keep_ids = [i for i, source in enumerate(problem_sources) if source in ['augmented_math', 'math']] dataset = dataset.select(keep_ids)

    remove duplicate problems...

    to_keep = [] problems_seen = set() for i, problem in enumerate(dataset['problem']): if problem not in problems_seen:… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMathInstruct-2-MATH-Questions.

  8. h

    OpenMathInstruct-2-QwQ

    • huggingface.co
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erdene-Ochir Tuguldur (2025). OpenMathInstruct-2-QwQ [Dataset]. https://huggingface.co/datasets/tugstugi/OpenMathInstruct-2-QwQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2025
    Authors
    Erdene-Ochir Tuguldur
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OpenMathInstruct-2-QwQ

    Qwen/QwQ-32B-Preview solutions of 107k augmented_math problems of nvidia/OpenMathInstruct-2. All solutions are validated and agree with the expected_answer field of OpenMathInstruct-2.

  9. h

    OpenMathInstruct-2-1M

    • huggingface.co
    Updated Oct 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    monology (2024). OpenMathInstruct-2-1M [Dataset]. https://huggingface.co/datasets/monology/OpenMathInstruct-2-1M
    Explore at:
    Dataset updated
    Oct 26, 2024
    Authors
    monology
    Description

    monology/OpenMathInstruct-2-1M dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    OpenMathInstruct-2-filtered-shard3

    • huggingface.co
    Updated Oct 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tlrm (2024). OpenMathInstruct-2-filtered-shard3 [Dataset]. https://huggingface.co/datasets/tlrm/OpenMathInstruct-2-filtered-shard3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2024
    Dataset authored and provided by
    tlrm
    Description

    tlrm/OpenMathInstruct-2-filtered-shard3 dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    OpenMathInstruct-2-CoT-JA

    • huggingface.co
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hajime Yagihara (2024). OpenMathInstruct-2-CoT-JA [Dataset]. https://huggingface.co/datasets/HachiML/OpenMathInstruct-2-CoT-JA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2024
    Authors
    Hajime Yagihara
    Description

    HachiML/OpenMathInstruct-2-CoT-JA dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    OpenMAthInstruct-2-AUGMATH-Deduped

    • huggingface.co
    Updated Sep 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo (2025). OpenMAthInstruct-2-AUGMATH-Deduped [Dataset]. https://huggingface.co/datasets/ricdomolm/OpenMAthInstruct-2-AUGMATH-Deduped
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2025
    Authors
    Ricardo
    Description

    import datasets import pandas as pd

    dataset = datasets.load_dataset('nvidia/OpenMathInstruct-2', split='train_1M') dataset = dataset.filter(lambda x: x["problem_source"] == "augmented_math") dataset = dataset.remove_columns(["generated_solution", "problem_source"])

    df = dataset.to_pandas() df = df.drop_duplicates(subset=["problem"]) dataset = datasets.Dataset.from_pandas(df)

    dataset = dataset.rename_column("expected_answer", "answer")… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMAthInstruct-2-AUGMATH-Deduped.

  13. h

    OpenMathInstruct-2-chat

    • huggingface.co
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samo line (2025). OpenMathInstruct-2-chat [Dataset]. https://huggingface.co/datasets/samoline/OpenMathInstruct-2-chat
    Explore at:
    Dataset updated
    Nov 5, 2025
    Authors
    samo line
    Description

    samoline/OpenMathInstruct-2-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. OpenMathInstruct-1

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-1 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    OpenMathInstruct-1

    OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.

  15. h

    openmathinstruct2_10000

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junxia Cui (2025). openmathinstruct2_10000 [Dataset]. https://huggingface.co/datasets/autoprogrammer/openmathinstruct2_10000
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Junxia Cui
    Description

    tags: - math license: apache-2.0 --- # OpenMathInstruct2 10000 这是从 nvidia/OpenMathInstruct-2 抽取的 10000 条样本。

  16. h

    OpenMathInstruct-2-ja-CoT

    • huggingface.co
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendamarron (2024). OpenMathInstruct-2-ja-CoT [Dataset]. https://huggingface.co/datasets/Kendamarron/OpenMathInstruct-2-ja-CoT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 10, 2024
    Authors
    Kendamarron
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Kendamarron/OpenMathInstruct-2-ja-CoT dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    openmathinstruct2-ex25000-seed5

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanni De Muri, openmathinstruct2-ex25000-seed5 [Dataset]. https://huggingface.co/datasets/giovannidemuri/openmathinstruct2-ex25000-seed5
    Explore at:
    Authors
    Giovanni De Muri
    Description

    giovannidemuri/openmathinstruct2-ex25000-seed5 dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    OpenMath-Difficulty-Annotated

    • huggingface.co
    Updated Oct 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HAD (2024). OpenMath-Difficulty-Annotated [Dataset]. https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated
    Explore at:
    Dataset updated
    Oct 26, 2024
    Authors
    HAD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    📐 OpenMath-Difficulty-Annotated

      🚀 Overview
    

    OpenMath-Difficulty-Annotated is a curated subset of OpenMathInstruct-2 containing 10,176 math problems, enhanced with precise difficulty metadata. While the original solutions are preserved from NVIDIA's dataset, we employed a 120B Parameter Model (LLM-as-a-Judge) to analyze and grade every single problem on a scale of 1 to 5. This allows developers of Small Language Models (1B-3B) to filter out "Olympiad-level" noise… See the full description on the dataset page: https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated.

  19. h

    open-math-instruct-2-en-ja-cot

    • huggingface.co
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F.Y. (2024). open-math-instruct-2-en-ja-cot [Dataset]. https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot
    Explore at:
    Dataset updated
    Dec 9, 2024
    Authors
    F.Y.
    Description

    データセット情報

    このデータセットは、Kendamarron/OpenMathInstruct-2-ja-CoTを、英語と日本語のペアを含めるよう、日本語の応答データを英語の応答データに翻訳しました。 翻訳後のデータをllm-as-a-judgeで翻訳品質を1〜5段階評価し、最高評価5のみをfilterして取得したものです。 その後、訓練データセットとテストデータセットに9:1で分割しました。

      引用データセット情報
    

    "Kendamarron/OpenMathInstruct-2-ja-CoT": 訓練データセットのみ 15282件
    Dataset: Dataset({ features: ['problem', 'generated_solution', 'expected_answer', 'problem_source', 'problem_ja', 'thought', 'output', 'evaluation', 'cot_output', 'system']… See the full description on the dataset page: https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot.

  20. h

    math-sft

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duong Hoang Le (2025). math-sft [Dataset]. https://huggingface.co/datasets/lehduong/math-sft
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Duong Hoang Le
    Description

    concat: OpenMathInstruct-2, OpenMathReasoning, AceMath, OpenR1-Math, Numinamath-CoT, Numinamath 1.5, OpenThoughts2-1M, MetaMathQA, Maths-College

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
Organization logo

OpenMathInstruct-2

OpenMathInstruct-2

nvidia/OpenMathInstruct-2

Explore at:
116 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

Search
Clear search
Close search
Google apps
Main menu