16 datasets found
  1. gsm8k

    • huggingface.co
    Updated Aug 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2022
    Dataset authored and provided by
    OpenAIhttp://openai.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for GSM8K

      Dataset Summary
    

    GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

    These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

  2. T

    gsm8k

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). gsm8k [Dataset]. https://www.tensorflow.org/datasets/catalog/gsm8k
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    A dataset of 8.5K high quality linguistically diverse grade school math word problems.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('gsm8k', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. h

    gsm8k

    • huggingface.co
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MLX Community (2025). gsm8k [Dataset]. https://huggingface.co/datasets/mlx-community/gsm8k
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    MLX Community
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OpenAI's GSM8K dataset converted to be compatibel with MLX-LM-LoRA. example uasge: pip install -U mlx-lm-lora

    python -m mlx_lm_lora.train
    --model mlx-community/Josiefied-Qwen3-0.6B-abliterated-v1-4bit
    --train
    --train-mode grpo
    --data mlx-community/gsm8k
    --iters 100
    --steps-per-report 1
    --batch-size 1
    --max-completion-length 512

  4. h

    Arabic-gsm8k

    • huggingface.co
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omartificial Intelligence Space (2025). Arabic-gsm8k [Dataset]. https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2025
    Authors
    Omartificial Intelligence Space
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Arabic GSM8K

      Dataset Summary
    

    Arabic GSM8K is an Arabic translation of the GSM8K (Grade School Math 8K) dataset, which contains high-quality linguistically diverse grade school math word problems. The original dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning, and this Arabic version aims to extend these capabilities to Arabic language models and applications. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k.

  5. h

    gretel-math-gsm8k-v0

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gretel.ai (2024). gretel-math-gsm8k-v0 [Dataset]. https://huggingface.co/datasets/gretelai/gretel-math-gsm8k-v0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    Gretel.ai
    License

    https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/

    Description

    gretelai/gsm8k-synthetic-diverse-405b

    This dataset is a synthetically generated version inspired by the GSM8K https://huggingface.co/datasets/openai/gsm8k dataset, created entirely using Gretel Navigator with meta-llama/Meta-Llama-3.1-405B as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity.

      Key Features:
    

    Synthetically Generated: Math problems created using Gretel… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/gretel-math-gsm8k-v0.

  6. h

    gsm8k

    • huggingface.co
    Updated Jan 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scale Frontier Data (2025). gsm8k [Dataset]. https://huggingface.co/datasets/ScaleFrontierData/gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Dataset authored and provided by
    Scale Frontier Data
    Description

    ScaleFrontierData/gsm8k dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    MuMath

    • huggingface.co
    Updated Jun 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iron Man (2024). MuMath [Dataset]. https://huggingface.co/datasets/weihao1/MuMath
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2024
    Authors
    Iron Man
    Description

    MuMath: Multi-perspective Data Augmentation for Mathematical Reasoning in Large Language Models

      Introduction
    

    We have amalgamated and further refined these strengths while broadening the scope of augmentation methods to construct a multi-perspective augmentation dataset for mathematics—termed MuMath (μ-Math) Dataset. Subsequently, we finetune LLaMA-2 on the MuMath dataset to derive the MuMath model.

    Model Size GSM8k MATH

    WizardMath-7B 7B 54.9 10.7

    MetaMath-7B… See the full description on the dataset page: https://huggingface.co/datasets/weihao1/MuMath.

  8. h

    Data from: mgsm

    • huggingface.co
    • opendatalab.com
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julen Etxaniz (2024). mgsm [Dataset]. https://huggingface.co/datasets/juletxara/mgsm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2024
    Authors
    Julen Etxaniz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper Language models are multilingual chain-of-thought reasoners.

    The same 250 problems from GSM8K are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu

    You can find the input and targets for each of the ten languages (and English) as .tsv files. We also include few-shot exemplars that are also manually translated from each language in exemplars.py.

  9. OpenMathInstruct-2

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OpenMathInstruct-2

    OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

    Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.

  10. MetaMath_DPO_FewShot

    • huggingface.co
    Updated Mar 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    agicorp (2024). MetaMath_DPO_FewShot [Dataset]. https://huggingface.co/datasets/agicorp/MetaMath_DPO_FewShot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2024
    Dataset provided by
    Agicorp
    Authors
    agicorp
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "MetaMath_DPO_FewShot"

    GSM8K \citep{cobbe2021training} is a dataset of diverse grade school maths word problems, which has been commonly adopted as a measure of the math and reasoning skills of LLMs. The MetaMath dataset is an extension of the training set of GSM8K using data augmentation. It is partitioned into queries and responses, where the query is a question involving mathematical calculation or reasoning, and the response is a logical series of steps and… See the full description on the dataset page: https://huggingface.co/datasets/agicorp/MetaMath_DPO_FewShot.

  11. h

    EvilMath

    • huggingface.co
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPY Lab - ETH Zurich (2025). EvilMath [Dataset]. https://huggingface.co/datasets/ethz-spylab/EvilMath
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset authored and provided by
    SPY Lab - ETH Zurich
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Mathematical problems on harmful topics generated from GSM8K. EvilMath contains harmful questions with objectively verifiable ground truth answers.

      Dataset Description and Design
    

    EvilMath is generated by rewording GSM8K math questions to include harmful terms that are typically refused by safety-aligned models. We reword math problems to contain dangerous terms such as “bombs” or “nuclear weapons,” while preserving the question logic and the necessary information to solve… See the full description on the dataset page: https://huggingface.co/datasets/ethz-spylab/EvilMath.

  12. h

    DOVE_Lite

    • huggingface.co
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite
    Explore at:
    Dataset updated
    Aug 13, 2025
    Dataset authored and provided by
    nlphuji
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

    🌐 Project Website | 📄 Read our paper

      Updates 📅
    

    2025-08-13: Expansion beyond multiple-choice task: Added comprehensive PromptSuite benchmark evaluations with ~37,000 LLM outputs across 9 diverse tasks including open-ended generation, mathematical reasoning, sentiment analysis, translation, summarization, and code generation (MMLU, GSM8K, SST, WMT14, CNN/DailyMail, MuSiQue… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.

  13. h

    Omni-MATH

    • huggingface.co
    Updated Sep 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bofei Gao (2024). Omni-MATH [Dataset]. https://huggingface.co/datasets/KbsdJames/Omni-MATH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2024
    Authors
    Bofei Gao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Omni-MATH

    Recent advancements in AI, particularly in large language models (LLMs), have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To mitigate this limitation, we propose a comprehensive and challenging benchmark specifically designed… See the full description on the dataset page: https://huggingface.co/datasets/KbsdJames/Omni-MATH.

  14. h

    Qu-QA-v2

    • huggingface.co
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ffhs9 (2024). Qu-QA-v2 [Dataset]. https://huggingface.co/datasets/Ereeeeef3/Qu-QA-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    ffhs9
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Qu QA v2 Dataset

    Qu QA v2 is a large-scale question-answering (QA) dataset designed for training and evaluating machine learning models. It consists of question-answer pairs in English, making it suitable for general-purpose QA tasks, as well as specialized domains like code-related question answering and GSM8k-style problems.

      Dataset Details
    

    Features:

    input: A string representing the question (dtype: string). output: A string representing the answer (dtype: string).… See the full description on the dataset page: https://huggingface.co/datasets/Ereeeeef3/Qu-QA-v2.

  15. h

    dParallel_LLaDA_Distill_Data

    • huggingface.co
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zigeng Chen (2025). dParallel_LLaDA_Distill_Data [Dataset]. https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data
    Explore at:
    Dataset updated
    Oct 1, 2025
    Authors
    Zigeng Chen
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    dParallel-LLaDA-Distill Dataset:

    This dataset is used for the certainty-forcing distillation process in dParallel. We use prompts from publicly available training datasets and let the pretrained model generate its own responses as training data. For LLaDA-8B-Instruct, we sample prompts from the GSM8K, PRM12K training set, and part of the Numina-Math dataset. We generate target trajectories using a semi-autoregressive strategy with a sequence length of 256 and block length of 32. We… See the full description on the dataset page: https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data.

  16. Llama-3.2-3B-Instruct-evals

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meta Llama, Llama-3.2-3B-Instruct-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals
    Explore at:
    Dataset provided by
    Metahttp://meta.com/
    Authors
    Meta Llama
    License

    https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/

    Description

    Dataset Card for Meta Evaluation Result Details for Llama-3.2-3B-Instruct

    This dataset contains the results of the Meta evaluation result details for Llama-3.2-3B-Instruct. The dataset has been created from 21 evaluation tasks. The tasks are: hellaswag_chat, infinite_bench, mmlu_hindi_chat, mmlu_portugese_chat, ifeval_loose, nih_multi_needle, mmlu, gsm8k, mgsm, mmlu_thai_chat, mmlu_spanish_chat, gpqa, bfcl_chat, mmlu_french_chat, ifeval_strict, nexus, math, arc_challenge… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Organization logo

gsm8k

openai/gsm8k

Grade School Math 8K

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for GSM8K

  Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

Search
Clear search
Close search
Google apps
Main menu