MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
A dataset of 8.5K high quality linguistically diverse grade school math word problems.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('gsm8k', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
OpenAI's GSM8K dataset converted to be compatibel with MLX-LM-LoRA. example uasge: pip install -U mlx-lm-lora
python -m mlx_lm_lora.train
--model mlx-community/Josiefied-Qwen3-0.6B-abliterated-v1-4bit
--train
--train-mode grpo
--data mlx-community/gsm8k
--iters 100
--steps-per-report 1
--batch-size 1
--max-completion-length 512
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Arabic GSM8K
Dataset Summary
Arabic GSM8K is an Arabic translation of the GSM8K (Grade School Math 8K) dataset, which contains high-quality linguistically diverse grade school math word problems. The original dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning, and this Arabic version aims to extend these capabilities to Arabic language models and applications. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k.
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
gretelai/gsm8k-synthetic-diverse-405b
This dataset is a synthetically generated version inspired by the GSM8K https://huggingface.co/datasets/openai/gsm8k dataset, created entirely using Gretel Navigator with meta-llama/Meta-Llama-3.1-405B as the agent LLM. It contains ~1500 Grade School-level math word problems with step-by-step solutions, focusing on age group, difficulty, and domain diversity.
Key Features:
Synthetically Generated: Math problems created using Gretel… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/gretel-math-gsm8k-v0.
ScaleFrontierData/gsm8k dataset hosted on Hugging Face and contributed by the HF Datasets community
MuMath: Multi-perspective Data Augmentation for Mathematical Reasoning in Large Language Models
Introduction
We have amalgamated and further refined these strengths while broadening the scope of augmentation methods to construct a multi-perspective augmentation dataset for mathematics—termed MuMath (μ-Math) Dataset. Subsequently, we finetune LLaMA-2 on the MuMath dataset to derive the MuMath model.
Model Size GSM8k MATH
WizardMath-7B 7B 54.9 10.7
MetaMath-7B… See the full description on the dataset page: https://huggingface.co/datasets/weihao1/MuMath.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper Language models are multilingual chain-of-thought reasoners.
The same 250 problems from GSM8K are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu
You can find the input and targets for each of the ten languages (and English) as .tsv
files.
We also include few-shot exemplars that are also manually translated from each language in exemplars.py
.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OpenMathInstruct-2
OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:
Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for "MetaMath_DPO_FewShot"
GSM8K \citep{cobbe2021training} is a dataset of diverse grade school maths word problems, which has been commonly adopted as a measure of the math and reasoning skills of LLMs. The MetaMath dataset is an extension of the training set of GSM8K using data augmentation. It is partitioned into queries and responses, where the query is a question involving mathematical calculation or reasoning, and the response is a logical series of steps and… See the full description on the dataset page: https://huggingface.co/datasets/agicorp/MetaMath_DPO_FewShot.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Mathematical problems on harmful topics generated from GSM8K. EvilMath contains harmful questions with objectively verifiable ground truth answers.
Dataset Description and Design
EvilMath is generated by rewording GSM8K math questions to include harmful terms that are typically refused by safety-aligned models. We reword math problems to contain dangerous terms such as “bombs” or “nuclear weapons,” while preserving the question logic and the necessary information to solve… See the full description on the dataset page: https://huggingface.co/datasets/ethz-spylab/EvilMath.
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
🌐 Project Website | 📄 Read our paper
Updates 📅
2025-08-13: Expansion beyond multiple-choice task: Added comprehensive PromptSuite benchmark evaluations with ~37,000 LLM outputs across 9 diverse tasks including open-ended generation, mathematical reasoning, sentiment analysis, translation, summarization, and code generation (MMLU, GSM8K, SST, WMT14, CNN/DailyMail, MuSiQue… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Omni-MATH
Recent advancements in AI, particularly in large language models (LLMs), have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To mitigate this limitation, we propose a comprehensive and challenging benchmark specifically designed… See the full description on the dataset page: https://huggingface.co/datasets/KbsdJames/Omni-MATH.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Qu QA v2 Dataset
Qu QA v2 is a large-scale question-answering (QA) dataset designed for training and evaluating machine learning models. It consists of question-answer pairs in English, making it suitable for general-purpose QA tasks, as well as specialized domains like code-related question answering and GSM8k-style problems.
Dataset Details
Features:
input: A string representing the question (dtype: string). output: A string representing the answer (dtype: string).… See the full description on the dataset page: https://huggingface.co/datasets/Ereeeeef3/Qu-QA-v2.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
dParallel-LLaDA-Distill Dataset:
This dataset is used for the certainty-forcing distillation process in dParallel. We use prompts from publicly available training datasets and let the pretrained model generate its own responses as training data. For LLaDA-8B-Instruct, we sample prompts from the GSM8K, PRM12K training set, and part of the Numina-Math dataset. We generate target trajectories using a semi-autoregressive strategy with a sequence length of 256 and block length of 32. We… See the full description on the dataset page: https://huggingface.co/datasets/Zigeng/dParallel_LLaDA_Distill_Data.
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Dataset Card for Meta Evaluation Result Details for Llama-3.2-3B-Instruct
This dataset contains the results of the Meta evaluation result details for Llama-3.2-3B-Instruct. The dataset has been created from 21 evaluation tasks. The tasks are: hellaswag_chat, infinite_bench, mmlu_hindi_chat, mmlu_portugese_chat, ifeval_loose, nih_multi_needle, mmlu, gsm8k, mgsm, mmlu_thai_chat, mmlu_spanish_chat, gpqa, bfcl_chat, mmlu_french_chat, ifeval_strict, nexus, math, arc_challenge… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.