20 datasets found

h
swallow-math
huggingface.co
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tokyotech-llm (2025). swallow-math [Dataset]. https://huggingface.co/datasets/tokyotech-llm/swallow-math
Explore at:
Dataset updated
May 7, 2025
Dataset authored and provided by
tokyotech-llm
License
https://choosealicense.com/licenses/llama3.3/https://choosealicense.com/licenses/llama3.3/
Description
SwallowMath

Resources

🐙 GitHub: Explore the project repository, including pipeline code and prompts at rioyokotalab/swallow-code-math. 📑 arXiv: Read our paper for detailed methodology and results at arXiv:2505.02881. 🤗 Sister Dataset: Discover SwallowCode, our companion dataset for code generation.

What is it?

SwallowMath is a high-quality mathematical dataset comprising approximately 2.3 billion tokens derived from the FineMath-4+ dataset through an… See the full description on the dataset page: https://huggingface.co/datasets/tokyotech-llm/swallow-math.
h
mu-math
huggingface.co
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toloka (2025). mu-math [Dataset]. https://huggingface.co/datasets/toloka/mu-math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2025
Dataset authored and provided by
Toloka
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
μ-MATH (Meta U-MATH) is a meta-evaluation dataset derived from the U-MATH benchmark. It is intended to assess the ability of LLMs to judge free-form mathematical solutions. The dataset includes 1,084 labeled samples generated from 271 U-MATH tasks, covering problems of varying assessment complexity. For fine-grained performance evaluation results, in-depth analyses and detailed discussions on behaviors and biases of LLM judges, check out our paper.

📊 U-MATH benchmark at Huggingface 🔎 μ-MATH… See the full description on the dataset page: https://huggingface.co/datasets/toloka/mu-math.
h
DAPO-17k-Eng
huggingface.co
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
math-dataset (2025). DAPO-17k-Eng [Dataset]. https://huggingface.co/datasets/math-dataset/DAPO-17k-Eng
Explore at:
Dataset updated
Apr 24, 2025
Dataset authored and provided by
math-dataset
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is the training dataset for the paper: "On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning" (https://arxiv.org/abs/2505.17508).
h
Synthesizer-8B-math-train-data
huggingface.co
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhang (2025). Synthesizer-8B-math-train-data [Dataset]. https://huggingface.co/datasets/BoHanMint/Synthesizer-8B-math-train-data
Explore at:
Dataset updated
May 14, 2025
Authors
zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description

Synthesizer-8B-math-train-data originates from the paper: CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis available on arXiv. You can visit the repo to learn more about the paper.

Citation

If you find our paper helpful, please cite the original paper: @misc{zhang2025cotbasedsynthesizerenhancingllm, title={CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis}, author={Bohan Zhang and Xiaokang… See the full description on the dataset page: https://huggingface.co/datasets/BoHanMint/Synthesizer-8B-math-train-data.
h
SAND-MATH
huggingface.co
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMD (2025). SAND-MATH [Dataset]. https://huggingface.co/datasets/amd/SAND-MATH
Explore at:
Dataset updated
Jul 29, 2025
Dataset authored and provided by
AMD
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
SAND-Math: A Synthetic Dataset of Difficult Problems to Elevate LLM Math Performance

📃 Paper | 🤗 Dataset SAND-Math (Synthetic Augmented Novel and Difficult Mathematics) is a high-quality, high-difficulty dataset of mathematics problems and solutions. It is generated using a novel pipeline that addresses the critical bottleneck of scarce, high-difficulty training data for mathematical Large Language Models (LLMs).

Key Features

Novel Problem Generation: Problems are… See the full description on the dataset page: https://huggingface.co/datasets/amd/SAND-MATH.
gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttps://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
f
Persona list by category.
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona list by category. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t002
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One way to steer generations from large language models (LLM) is to assign a persona: a role that describes how the user expects the LLM to behave (e.g., a helpful assistant, a teacher, a woman). This paper investigates how personas affect diverse aspects of model behavior. We assign to seven LLMs 162 personas from 12 categories spanning variables like gender, sexual orientation, and occupation. We prompt them to answer questions from five datasets covering objective (e.g., questions about math and history) and subjective tasks (e.g., questions about beliefs and values). We also compare persona’s generations to two baseline settings: a control persona setting with 30 paraphrases of “a helpful assistant” to control for models’ prompt sensitivity, and an empty persona setting where no persona is assigned. We find that for all models and datasets, personas show greater variability than the control setting and that some measures of persona behavior generalize across models.
h
MathVision
huggingface.co
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LLMs for Reasoning (2025). MathVision [Dataset]. https://huggingface.co/datasets/MathLLMs/MathVision
Explore at:
Dataset updated
May 16, 2025
Dataset authored and provided by
LLMs for Reasoning
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Measuring Multimodal Mathematical Reasoning with the MATH-Vision Dataset

[💻 Github] [🌐 Homepage] [📊 Leaderboard ] [📊 Open Source Leaderboard ] [🔍 Visualization] [📖 Paper]

🚀 Data Usage

from datasets import load_dataset

dataset = load_dataset("MathLLMs/MathVision") print(dataset)

💥 News

[2025.05.16] 💥 We now support the official open-source leaderboard! 🔥🔥🔥 Skywork-R1V2-38B is the best open-source model, scoring 49.7% on MATH-Vision. 🔥🔥🔥… See the full description on the dataset page: https://huggingface.co/datasets/MathLLMs/MathVision.
h
ReliableMath
huggingface.co
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XUE Boyang (2025). ReliableMath [Dataset]. https://huggingface.co/datasets/BeyondHsueh/ReliableMath
Explore at:
Dataset updated
May 15, 2025
Authors
XUE Boyang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for ReliableMath

Dataset Description

A mathematical reasoning dataset including both solvable and unsolvable math problems to evaluate LLM reliability on reasoning tasks.

Repository: GitHub Paper: arXiv Leaderboard: Leaderboard Point of Contact: byxue@se.cuhk.edu.hk

The following are the illustrations of (a) an unreliable LLM may fabricate incorrect or nonsensical content on math problems; (b) a reliable LLM can correctly answer solvable problems or… See the full description on the dataset page: https://huggingface.co/datasets/BeyondHsueh/ReliableMath.
h
CompositionalGSM_augmented
huggingface.co
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ChuGyouk (2024). CompositionalGSM_augmented [Dataset]. https://huggingface.co/datasets/ChuGyouk/CompositionalGSM_augmented
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2024
Authors
ChuGyouk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Compositional GSM_augmented

Compositional GSM_augmented is a math instruction dataset, inspired by Not All LLM Reasoners Are Created Equal. It is based on nvidia/OpenMathInstruct-2 dataset, so you can use this dataset as training dataset. It is generated using meta-llama/Meta-Llama-3.1-70B-Instruct model by Hyperbloic AI link. (Thanks for free credit!) Replace the description of the data with the contents in the paper.

Each question in compositional GSM consists of two questions… See the full description on the dataset page: https://huggingface.co/datasets/ChuGyouk/CompositionalGSM_augmented.
h
FormalStep
huggingface.co
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu Chengwu (2025). FormalStep [Dataset]. https://huggingface.co/datasets/liuchengwu/FormalStep
Explore at:
Dataset updated
Jun 9, 2025
Authors
Liu Chengwu
Description
Safe (ACL 2025 Main)

TL;DR: A Lean 4 theorem-proving dataset, where these theorems are used to validate the correctness of LLM mathematical reasoning steps, synthesized using Safe. The official implementation of our paper Safe (Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification) and its associated datasets FormalStep. Paper Code Dataset

Citation

If you find our work useful, please consider citing our paper.… See the full description on the dataset page: https://huggingface.co/datasets/liuchengwu/FormalStep.
h
jeebench
huggingface.co
Updated Jan 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daman (2024). jeebench [Dataset]. https://huggingface.co/datasets/daman1209arora/jeebench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2024
Authors
Daman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
JEEBench(EMNLP 2023)

Repository for the code and dataset for the paper: "Have LLMs Advanced Enough? A Harder Problem Solving Benchmark For Large Language Models" accepted in EMNLP 2023 as a Main conference paper. https://aclanthology.org/2023.emnlp-main.468/

Citation

If you use our dataset in your research, please cite it using the following @inproceedings{arora-etal-2023-llms, title = "Have {LLM}s Advanced Enough? A Challenging Problem Solving Benchmark For Large… See the full description on the dataset page: https://huggingface.co/datasets/daman1209arora/jeebench.
Nemotron-MIND
huggingface.co
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2024). Nemotron-MIND [Dataset]. https://huggingface.co/datasets/nvidia/Nemotron-MIND
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nemotron-MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Authors: Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro [Paper] [Blog]

Dataset Description

Figure 1: Math Informed syNthetic Dialogue. We (a) manually design prompts of seven conversational styles, (b) provide the prompt along with raw context as input to an LLM to obtain diverse synthetic… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Nemotron-MIND.
h
arxiv_small_nougat
huggingface.co
Updated Dec 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priya Dwivedi (2023). arxiv_small_nougat [Dataset]. https://huggingface.co/datasets/deep-learning-analytics/arxiv_small_nougat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 27, 2023
Authors
Priya Dwivedi
Description
Dataset Description

The "arxiv_small_nougat" dataset is a collection of 108 recent papers sourced from arXiv, focusing on topics related to Large Language Models (LLM) and Transformers. These papers have been meticulously processed and parsed using Meta's Nougat model, which is specifically designed to retain the integrity of complex elements such as tables and mathematical equations.

Data Format

The dataset contains the parsed content of the selected papers, with special… See the full description on the dataset page: https://huggingface.co/datasets/deep-learning-analytics/arxiv_small_nougat.
h
cmm-math
huggingface.co
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ICALK (2024). cmm-math [Dataset]. https://huggingface.co/datasets/ecnu-icalk/cmm-math
Explore at:
Dataset updated
Sep 13, 2024
Dataset authored and provided by
ICALK
License
https://choosealicense.com/licenses/bsd-3-clause/https://choosealicense.com/licenses/bsd-3-clause/
Description
CMM-Math

💻 Github Repo 💻 Paper Link 💻 Math-LLM-7B 💻 Math-LLM-7B

📥 Download Supplementary Material Introduction

Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal… See the full description on the dataset page: https://huggingface.co/datasets/ecnu-icalk/cmm-math.
h
RLPR-Train-Dataset
huggingface.co
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenBMB (2025). RLPR-Train-Dataset [Dataset]. https://huggingface.co/datasets/openbmb/RLPR-Train-Dataset
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
OpenBMB
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for RLPR-Train-Dataset

GitHub | Paper

News:

[2025.06.23] 📃 Our paper detailing the RLPR framework and this dataset is accessible at here.

Dataset Summary

The RLPR-Train-Dataset is a curated collection of 77k high-quality reasoning prompts specifically designed for enhancing Large Language Model (LLM) capabilities in the general domain (non-mathematical). This dataset is derived from the comprehensive collection of prompts from WebInstruct. We… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/RLPR-Train-Dataset.
h
WebInstructSub
huggingface.co
Updated May 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIGER-Lab (2024). WebInstructSub [Dataset]. https://huggingface.co/datasets/TIGER-Lab/WebInstructSub
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Dataset authored and provided by
TIGER-Lab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🦣 MAmmoTH2: Scaling Instructions from the Web

Project Page: https://tiger-ai-lab.github.io/MAmmoTH2/ Paper: https://arxiv.org/pdf/2405.03548 Code: https://github.com/TIGER-AI-Lab/MAmmoTH2

WebInstruct (Subset)

This repo contains the partial dataset used in "MAmmoTH2: Scaling Instructions from the Web". This partial data is coming mostly from the forums like stackexchange. This subset contains very high-quality data to boost LLM performance through instruction tuning.… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/WebInstructSub.
h
realmistake
huggingface.co
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryo Kamoi (2025). realmistake [Dataset]. https://huggingface.co/datasets/ryokamoi/realmistake
Explore at:
Dataset updated
May 21, 2025
Authors
Ryo Kamoi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
We request you not to publish examples of this dataset online in plain text to reduce the risk of leakage into foundation model training corpora.

ReaLMistake

ReaLMistake is a benchmark proposed in the paper "Evaluating LLMs at Detecting Errors in LLM Responses" (COLM 2024). ReaLMistake is a benchmark for evaluating binary error detection methods that detect errors in LLM responses. This benchmark includes natural errors made by GPT-4 and Llama 2 70B on three tasks (math word… See the full description on the dataset page: https://huggingface.co/datasets/ryokamoi/realmistake.
h
X-SVAMP_en_zh_ko_it_es
huggingface.co
Updated Jan 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhihan Zhang (2024). X-SVAMP_en_zh_ko_it_es [Dataset]. https://huggingface.co/datasets/zhihz0535/X-SVAMP_en_zh_ko_it_es
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2024
Authors
Zhihan Zhang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
X-SVAMP

🤗 Paper | 📖 arXiv

Dataset Description

X-SVAMP is an evaluation benchmark for multilingual large language models (LLMs), including questions and answers in 5 languages (English, Chinese, Korean, Italian and Spanish). It is intended to evaluate the math reasoning abilities of LLMs. The dataset is translated by GPT-4-turbo from the original English-version SVAMP. In our paper, we evaluate LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with… See the full description on the dataset page: https://huggingface.co/datasets/zhihz0535/X-SVAMP_en_zh_ko_it_es.
h
alpaca
huggingface.co
opendatalab.com
Updated Mar 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Dataset authored and provided by
Tatsu Lab
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca

Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

tokyotech-llm (2025). swallow-math [Dataset]. https://huggingface.co/datasets/tokyotech-llm/swallow-math

swallow-math

swallowmath

tokyotech-llm/swallow-math

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 7, 2025

Dataset authored and provided by

tokyotech-llm

License

https://choosealicense.com/licenses/llama3.3/https://choosealicense.com/licenses/llama3.3/

Description

SwallowMath

  Resources

🐙 GitHub: Explore the project repository, including pipeline code and prompts at rioyokotalab/swallow-code-math. 📑 arXiv: Read our paper for detailed methodology and results at arXiv:2505.02881. 🤗 Sister Dataset: Discover SwallowCode, our companion dataset for code generation.

  What is it?

SwallowMath is a high-quality mathematical dataset comprising approximately 2.3 billion tokens derived from the FineMath-4+ dataset through an… See the full description on the dataset page: https://huggingface.co/datasets/tokyotech-llm/swallow-math.

Clear search

Close search

Google apps

Main menu

swallow-math

mu-math

DAPO-17k-Eng

Synthesizer-8B-math-train-data

SAND-MATH

gsm8k

Persona list by category.

MathVision

ReliableMath

CompositionalGSM_augmented

FormalStep

jeebench

Nemotron-MIND

arxiv_small_nougat

cmm-math

RLPR-Train-Dataset

WebInstructSub

realmistake

X-SVAMP_en_zh_ko_it_es

alpaca

swallow-math

swallowmath

tokyotech-llm/swallow-math