16 datasets found

gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
T
gsm8k
tensorflow.org
opendatalab.com
+1more
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). gsm8k [Dataset]. https://www.tensorflow.org/datasets/catalog/gsm8k
Explore at:
Dataset updated
Dec 6, 2022
Description
A dataset of 8.5K high quality linguistically diverse grade school math word problems.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('gsm8k', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
gsm8k
huggingface.co
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MLX Community (2025). gsm8k [Dataset]. https://huggingface.co/datasets/mlx-community/gsm8k
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
MLX Community
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
OpenAI's GSM8K dataset converted to be compatibel with MLX-LM-LoRA. example uasge: pip install -U mlx-lm-lora

python -m mlx_lm_lora.train
--model mlx-community/Josiefied-Qwen3-0.6B-abliterated-v1-4bit
--train
--train-mode grpo
--data mlx-community/gsm8k
--iters 100
--steps-per-report 1
--batch-size 1
--max-completion-length 512
h
Arabic-gsm8k
huggingface.co
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omartificial Intelligence Space (2025). Arabic-gsm8k [Dataset]. https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2025
Authors
Omartificial Intelligence Space
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Arabic GSM8K

Dataset Summary

Arabic GSM8K is an Arabic translation of the GSM8K (Grade School Math 8K) dataset, which contains high-quality linguistically diverse grade school math word problems. The original dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning, and this Arabic version aims to extend these capabilities to Arabic language models and applications. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k.
h
gretel-math-gsm8k-v0
huggingface.co
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gretel.ai (2024). gretel-math-gsm8k-v0 [Dataset]. https://huggingface.co/datasets/gretelai/gretel-math-gsm8k-v0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Dataset provided by
Gretel.ai
License
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Description
gretelai/gsm8k-synthetic-diverse-405b

This dataset is a synthetically generated version inspired by the GSM8K https://huggingface.co/datasets/openai/gsm8k dataset, created entirely using Gretel Navigator with meta-llama/Meta-Llama-3.1-405B as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity.

Key Features:

Synthetically Generated: Math problems created using Gretel… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/gretel-math-gsm8k-v0.
h
gsm8k
huggingface.co
Updated Jan 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scale Frontier Data (2025). gsm8k [Dataset]. https://huggingface.co/datasets/ScaleFrontierData/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2025
Dataset authored and provided by
Scale Frontier Data
Description
ScaleFrontierData/gsm8k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MuMath
huggingface.co
Updated Jun 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iron Man (2024). MuMath [Dataset]. https://huggingface.co/datasets/weihao1/MuMath
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 2, 2024
Authors
Iron Man
Description
MuMath: Multi-perspective Data Augmentation for Mathematical Reasoning in Large Language Models

Introduction

We have amalgamated and further refined these strengths while broadening the scope of augmentation methods to construct a multi-perspective augmentation dataset for mathematics—termed MuMath (μ-Math) Dataset. Subsequently, we finetune LLaMA-2 on the MuMath dataset to derive the MuMath model.

Model Size GSM8k MATH

WizardMath-7B 7B 54.9 10.7

MetaMath-7B… See the full description on the dataset page: https://huggingface.co/datasets/weihao1/MuMath.
h
Data from: mgsm
huggingface.co
opendatalab.com
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julen Etxaniz (2024). mgsm [Dataset]. https://huggingface.co/datasets/juletxara/mgsm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2024
Authors
Julen Etxaniz
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper Language models are multilingual chain-of-thought reasoners.

The same 250 problems from GSM8K are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu

You can find the input and targets for each of the ten languages (and English) as .tsv files. We also include few-shot exemplars that are also manually translated from each language in exemplars.py.
OpenMathInstruct-2
huggingface.co
Updated Oct 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). OpenMathInstruct-2 [Dataset]. https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenMathInstruct-2

OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:

Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
MetaMath_DPO_FewShot
huggingface.co
Updated Mar 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
agicorp (2024). MetaMath_DPO_FewShot [Dataset]. https://huggingface.co/datasets/agicorp/MetaMath_DPO_FewShot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2024
Dataset provided by
Agicorp
Authors
agicorp
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "MetaMath_DPO_FewShot"

GSM8K \citep{cobbe2021training} is a dataset of diverse grade school maths word problems, which has been commonly adopted as a measure of the math and reasoning skills of LLMs. The MetaMath dataset is an extension of the training set of GSM8K using data augmentation. It is partitioned into queries and responses, where the query is a question involving mathematical calculation or reasoning, and the response is a logical series of steps and… See the full description on the dataset page: https://huggingface.co/datasets/agicorp/MetaMath_DPO_FewShot.
h
EvilMath
huggingface.co
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SPY Lab - ETH Zurich (2025). EvilMath [Dataset]. https://huggingface.co/datasets/ethz-spylab/EvilMath
Explore at:
Dataset updated
Apr 4, 2025
Dataset authored and provided by
SPY Lab - ETH Zurich
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Mathematical problems on harmful topics generated from GSM8K. EvilMath contains harmful questions with objectively verifiable ground truth answers.

Dataset Description and Design

EvilMath is generated by rewording GSM8K math questions to include harmful terms that are typically refused by safety-aligned models. We reword math problems to contain dangerous terms such as “bombs” or “nuclear weapons,” while preserving the question logic and the necessary information to solve… See the full description on the dataset page: https://huggingface.co/datasets/ethz-spylab/EvilMath.
h
DOVE_Lite
huggingface.co
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite
Explore at:
Dataset updated
Aug 13, 2025
Dataset authored and provided by
nlphuji
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

🌐 Project Website | 📄 Read our paper

Updates 📅

2025-08-13: Expansion beyond multiple-choice task: Added comprehensive PromptSuite benchmark evaluations with ~37,000 LLM outputs across 9 diverse tasks including open-ended generation, mathematical reasoning, sentiment analysis, translation, summarization, and code generation (MMLU, GSM8K, SST, WMT14, CNN/DailyMail, MuSiQue… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.
h
Omni-MATH
huggingface.co
Updated Sep 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bofei Gao (2024). Omni-MATH [Dataset]. https://huggingface.co/datasets/KbsdJames/Omni-MATH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2024
Authors
Bofei Gao
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Omni-MATH

Recent advancements in AI, particularly in large language models (LLMs), have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To mitigate this limitation, we propose a comprehensive and challenging benchmark specifically designed… See the full description on the dataset page: https://huggingface.co/datasets/KbsdJames/Omni-MATH.
h
Qu-QA-v2
huggingface.co
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ffhs9 (2024). Qu-QA-v2 [Dataset]. https://huggingface.co/datasets/Ereeeeef3/Qu-QA-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2024
Authors
ffhs9
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Qu QA v2 Dataset

Qu QA v2 is a large-scale question-answering (QA) dataset designed for training and evaluating machine learning models. It consists of question-answer pairs in English, making it suitable for general-purpose QA tasks, as well as specialized domains like code-related question answering and GSM8k-style problems.

Dataset Details

Features:

input: A string representing the question (dtype: string). output: A string representing the answer (dtype: string).… See the full description on the dataset page: https://huggingface.co/datasets/Ereeeeef3/Qu-QA-v2.
h
dParallel_LLaDA_Distill_Data
huggingface.co
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zigeng Chen (2025). dParallel_LLaDA_Distill_Data [Dataset]. https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data
Explore at:
Dataset updated
Oct 1, 2025
Authors
Zigeng Chen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
dParallel-LLaDA-Distill Dataset:

This dataset is used for the certainty-forcing distillation process in dParallel. We use prompts from publicly available training datasets and let the pretrained model generate its own responses as training data. For LLaDA-8B-Instruct, we sample prompts from the GSM8K, PRM12K training set, and part of the Numina-Math dataset. We generate target trajectories using a semi-autoregressive strategy with a sequence length of 256 and block length of 32. We… See the full description on the dataset page: https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data.
Llama-3.2-3B-Instruct-evals
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meta Llama, Llama-3.2-3B-Instruct-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals
Explore at:
Dataset provided by
Metahttp://meta.com/
Authors
Meta Llama
License
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Description
Dataset Card for Meta Evaluation Result Details for Llama-3.2-3B-Instruct

This dataset contains the results of the Meta evaluation result details for Llama-3.2-3B-Instruct. The dataset has been created from 21 evaluation tasks. The tasks are: hellaswag_chat, infinite_bench, mmlu_hindi_chat, mmlu_portugese_chat, ifeval_loose, nih_multi_needle, mmlu, gsm8k, mgsm, mmlu_thai_chat, mmlu_spanish_chat, gpqa, bfcl_chat, mmlu_french_chat, ifeval_strict, nexus, math, arc_challenge… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k

gsm8k

openai/gsm8k

Grade School Math 8K

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 11, 2022

Dataset authored and provided by

OpenAIhttp://openai.com/

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for GSM8K

  Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

Clear search

Close search

Google apps

Main menu

gsm8k

gsm8k

gsm8k

Arabic-gsm8k

gretel-math-gsm8k-v0

gsm8k

MuMath

Data from: mgsm

OpenMathInstruct-2

MetaMath_DPO_FewShot

EvilMath

DOVE_Lite

Omni-MATH

Qu-QA-v2

dParallel_LLaDA_Distill_Data

Llama-3.2-3B-Instruct-evals

gsm8kSee More Versions

openai/gsm8k

Grade School Math 8K

gsm8k