20 datasets found
  1. h

    swallow-math

    • huggingface.co
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tokyotech-llm (2025). swallow-math [Dataset]. https://huggingface.co/datasets/tokyotech-llm/swallow-math
    Explore at:
    Dataset updated
    May 7, 2025
    Dataset authored and provided by
    tokyotech-llm
    License

    https://choosealicense.com/licenses/llama3.3/https://choosealicense.com/licenses/llama3.3/

    Description

    SwallowMath

      Resources
    

    🐙 GitHub: Explore the project repository, including pipeline code and prompts at rioyokotalab/swallow-code-math. 📑 arXiv: Read our paper for detailed methodology and results at arXiv:2505.02881. 🤗 Sister Dataset: Discover SwallowCode, our companion dataset for code generation.

      What is it?
    

    SwallowMath is a high-quality mathematical dataset comprising approximately 2.3 billion tokens derived from the FineMath-4+ dataset through an… See the full description on the dataset page: https://huggingface.co/datasets/tokyotech-llm/swallow-math.

  2. h

    mu-math

    • huggingface.co
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toloka (2025). mu-math [Dataset]. https://huggingface.co/datasets/toloka/mu-math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Dataset authored and provided by
    Toloka
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    μ-MATH (Meta U-MATH) is a meta-evaluation dataset derived from the U-MATH benchmark. It is intended to assess the ability of LLMs to judge free-form mathematical solutions. The dataset includes 1,084 labeled samples generated from 271 U-MATH tasks, covering problems of varying assessment complexity. For fine-grained performance evaluation results, in-depth analyses and detailed discussions on behaviors and biases of LLM judges, check out our paper.

    📊 U-MATH benchmark at Huggingface 🔎 μ-MATH… See the full description on the dataset page: https://huggingface.co/datasets/toloka/mu-math.

  3. h

    DAPO-17k-Eng

    • huggingface.co
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    math-dataset (2025). DAPO-17k-Eng [Dataset]. https://huggingface.co/datasets/math-dataset/DAPO-17k-Eng
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    math-dataset
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is the training dataset for the paper: "On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning" (https://arxiv.org/abs/2505.17508).

  4. h

    Synthesizer-8B-math-train-data

    • huggingface.co
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang (2025). Synthesizer-8B-math-train-data [Dataset]. https://huggingface.co/datasets/BoHanMint/Synthesizer-8B-math-train-data
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description

    Synthesizer-8B-math-train-data originates from the paper: CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis available on arXiv. You can visit the repo to learn more about the paper.

      Citation
    

    If you find our paper helpful, please cite the original paper: @misc{zhang2025cotbasedsynthesizerenhancingllm, title={CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis}, author={Bohan Zhang and Xiaokang… See the full description on the dataset page: https://huggingface.co/datasets/BoHanMint/Synthesizer-8B-math-train-data.

  5. h

    SAND-MATH

    • huggingface.co
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMD (2025). SAND-MATH [Dataset]. https://huggingface.co/datasets/amd/SAND-MATH
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset authored and provided by
    AMD
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    SAND-Math: A Synthetic Dataset of Difficult Problems to Elevate LLM Math Performance

    📃 Paper | 🤗 Dataset SAND-Math (Synthetic Augmented Novel and Difficult Mathematics) is a high-quality, high-difficulty dataset of mathematics problems and solutions. It is generated using a novel pipeline that addresses the critical bottleneck of scarce, high-difficulty training data for mathematical Large Language Models (LLMs).

      Key Features
    

    Novel Problem Generation: Problems are… See the full description on the dataset page: https://huggingface.co/datasets/amd/SAND-MATH.

  6. gsm8k

    • huggingface.co
    Updated Aug 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2022
    Dataset authored and provided by
    OpenAIhttps://openai.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for GSM8K

      Dataset Summary
    

    GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

    These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

  7. f

    Persona list by category.

    • plos.figshare.com
    xls
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona list by category. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Henrique Luz de Araujo; Benjamin Roth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    One way to steer generations from large language models (LLM) is to assign a persona: a role that describes how the user expects the LLM to behave (e.g., a helpful assistant, a teacher, a woman). This paper investigates how personas affect diverse aspects of model behavior. We assign to seven LLMs 162 personas from 12 categories spanning variables like gender, sexual orientation, and occupation. We prompt them to answer questions from five datasets covering objective (e.g., questions about math and history) and subjective tasks (e.g., questions about beliefs and values). We also compare persona’s generations to two baseline settings: a control persona setting with 30 paraphrases of “a helpful assistant” to control for models’ prompt sensitivity, and an empty persona setting where no persona is assigned. We find that for all models and datasets, personas show greater variability than the control setting and that some measures of persona behavior generalize across models.

  8. h

    MathVision

    • huggingface.co
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLMs for Reasoning (2025). MathVision [Dataset]. https://huggingface.co/datasets/MathLLMs/MathVision
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    LLMs for Reasoning
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Measuring Multimodal Mathematical Reasoning with the MATH-Vision Dataset

    [💻 Github] [🌐 Homepage] [📊 Leaderboard ] [📊 Open Source Leaderboard ] [🔍 Visualization] [📖 Paper]

      🚀 Data Usage
    

    from datasets import load_dataset

    dataset = load_dataset("MathLLMs/MathVision") print(dataset)

      💥 News
    

    [2025.05.16] 💥 We now support the official open-source leaderboard! 🔥🔥🔥 Skywork-R1V2-38B is the best open-source model, scoring 49.7% on MATH-Vision. 🔥🔥🔥… See the full description on the dataset page: https://huggingface.co/datasets/MathLLMs/MathVision.

  9. h

    ReliableMath

    • huggingface.co
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XUE Boyang (2025). ReliableMath [Dataset]. https://huggingface.co/datasets/BeyondHsueh/ReliableMath
    Explore at:
    Dataset updated
    May 15, 2025
    Authors
    XUE Boyang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for ReliableMath

      Dataset Description
    

    A mathematical reasoning dataset including both solvable and unsolvable math problems to evaluate LLM reliability on reasoning tasks.

    Repository: GitHub Paper: arXiv Leaderboard: Leaderboard Point of Contact: byxue@se.cuhk.edu.hk

    The following are the illustrations of (a) an unreliable LLM may fabricate incorrect or nonsensical content on math problems; (b) a reliable LLM can correctly answer solvable problems or… See the full description on the dataset page: https://huggingface.co/datasets/BeyondHsueh/ReliableMath.

  10. h

    CompositionalGSM_augmented

    • huggingface.co
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ChuGyouk (2024). CompositionalGSM_augmented [Dataset]. https://huggingface.co/datasets/ChuGyouk/CompositionalGSM_augmented
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Authors
    ChuGyouk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compositional GSM_augmented

    Compositional GSM_augmented is a math instruction dataset, inspired by Not All LLM Reasoners Are Created Equal. It is based on nvidia/OpenMathInstruct-2 dataset, so you can use this dataset as training dataset. It is generated using meta-llama/Meta-Llama-3.1-70B-Instruct model by Hyperbloic AI link. (Thanks for free credit!) Replace the description of the data with the contents in the paper.

    Each question in compositional GSM consists of two questions… See the full description on the dataset page: https://huggingface.co/datasets/ChuGyouk/CompositionalGSM_augmented.

  11. h

    FormalStep

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu Chengwu (2025). FormalStep [Dataset]. https://huggingface.co/datasets/liuchengwu/FormalStep
    Explore at:
    Dataset updated
    Jun 9, 2025
    Authors
    Liu Chengwu
    Description

    Safe (ACL 2025 Main)

    TL;DR: A Lean 4 theorem-proving dataset, where these theorems are used to validate the correctness of LLM mathematical reasoning steps, synthesized using Safe. The official implementation of our paper Safe (Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification) and its associated datasets FormalStep. Paper Code Dataset

      Citation
    

    If you find our work useful, please consider citing our paper.… See the full description on the dataset page: https://huggingface.co/datasets/liuchengwu/FormalStep.

  12. h

    jeebench

    • huggingface.co
    Updated Jan 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daman (2024). jeebench [Dataset]. https://huggingface.co/datasets/daman1209arora/jeebench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2024
    Authors
    Daman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    JEEBench(EMNLP 2023)

    Repository for the code and dataset for the paper: "Have LLMs Advanced Enough? A Harder Problem Solving Benchmark For Large Language Models" accepted in EMNLP 2023 as a Main conference paper. https://aclanthology.org/2023.emnlp-main.468/

      Citation
    

    If you use our dataset in your research, please cite it using the following @inproceedings{arora-etal-2023-llms, title = "Have {LLM}s Advanced Enough? A Challenging Problem Solving Benchmark For Large… See the full description on the dataset page: https://huggingface.co/datasets/daman1209arora/jeebench.

  13. Nemotron-MIND

    • huggingface.co
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2024). Nemotron-MIND [Dataset]. https://huggingface.co/datasets/nvidia/Nemotron-MIND
    Explore at:
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nemotron-MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

    Authors: Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro [Paper] [Blog]

      Dataset Description
    

    Figure 1: Math Informed syNthetic Dialogue. We (a) manually design prompts of seven conversational styles, (b) provide the prompt along with raw context as input to an LLM to obtain diverse synthetic… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Nemotron-MIND.

  14. h

    arxiv_small_nougat

    • huggingface.co
    Updated Dec 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priya Dwivedi (2023). arxiv_small_nougat [Dataset]. https://huggingface.co/datasets/deep-learning-analytics/arxiv_small_nougat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2023
    Authors
    Priya Dwivedi
    Description

    Dataset Description

    The "arxiv_small_nougat" dataset is a collection of 108 recent papers sourced from arXiv, focusing on topics related to Large Language Models (LLM) and Transformers. These papers have been meticulously processed and parsed using Meta's Nougat model, which is specifically designed to retain the integrity of complex elements such as tables and mathematical equations.

      Data Format
    

    The dataset contains the parsed content of the selected papers, with special… See the full description on the dataset page: https://huggingface.co/datasets/deep-learning-analytics/arxiv_small_nougat.

  15. h

    cmm-math

    • huggingface.co
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ICALK (2024). cmm-math [Dataset]. https://huggingface.co/datasets/ecnu-icalk/cmm-math
    Explore at:
    Dataset updated
    Sep 13, 2024
    Dataset authored and provided by
    ICALK
    License

    https://choosealicense.com/licenses/bsd-3-clause/https://choosealicense.com/licenses/bsd-3-clause/

    Description

    CMM-Math

    💻 Github Repo 💻 Paper Link 💻 Math-LLM-7B 💻 Math-LLM-7B

      📥 Download Supplementary Material
    
    
    
    
    
    
      Introduction
    

    Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal… See the full description on the dataset page: https://huggingface.co/datasets/ecnu-icalk/cmm-math.

  16. h

    RLPR-Train-Dataset

    • huggingface.co
    Updated Jul 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenBMB (2025). RLPR-Train-Dataset [Dataset]. https://huggingface.co/datasets/openbmb/RLPR-Train-Dataset
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    OpenBMB
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for RLPR-Train-Dataset

    GitHub | Paper

      News:
    

    [2025.06.23] 📃 Our paper detailing the RLPR framework and this dataset is accessible at here.

      Dataset Summary
    

    The RLPR-Train-Dataset is a curated collection of 77k high-quality reasoning prompts specifically designed for enhancing Large Language Model (LLM) capabilities in the general domain (non-mathematical). This dataset is derived from the comprehensive collection of prompts from WebInstruct. We… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/RLPR-Train-Dataset.

  17. h

    WebInstructSub

    • huggingface.co
    Updated May 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIGER-Lab (2024). WebInstructSub [Dataset]. https://huggingface.co/datasets/TIGER-Lab/WebInstructSub
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2024
    Dataset authored and provided by
    TIGER-Lab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🦣 MAmmoTH2: Scaling Instructions from the Web

    Project Page: https://tiger-ai-lab.github.io/MAmmoTH2/ Paper: https://arxiv.org/pdf/2405.03548 Code: https://github.com/TIGER-AI-Lab/MAmmoTH2

      WebInstruct (Subset)
    

    This repo contains the partial dataset used in "MAmmoTH2: Scaling Instructions from the Web". This partial data is coming mostly from the forums like stackexchange. This subset contains very high-quality data to boost LLM performance through instruction tuning.… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/WebInstructSub.

  18. h

    realmistake

    • huggingface.co
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryo Kamoi (2025). realmistake [Dataset]. https://huggingface.co/datasets/ryokamoi/realmistake
    Explore at:
    Dataset updated
    May 21, 2025
    Authors
    Ryo Kamoi
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    We request you not to publish examples of this dataset online in plain text to reduce the risk of leakage into foundation model training corpora.

      ReaLMistake
    

    ReaLMistake is a benchmark proposed in the paper "Evaluating LLMs at Detecting Errors in LLM Responses" (COLM 2024). ReaLMistake is a benchmark for evaluating binary error detection methods that detect errors in LLM responses. This benchmark includes natural errors made by GPT-4 and Llama 2 70B on three tasks (math word… See the full description on the dataset page: https://huggingface.co/datasets/ryokamoi/realmistake.

  19. h

    X-SVAMP_en_zh_ko_it_es

    • huggingface.co
    Updated Jan 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhihan Zhang (2024). X-SVAMP_en_zh_ko_it_es [Dataset]. https://huggingface.co/datasets/zhihz0535/X-SVAMP_en_zh_ko_it_es
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2024
    Authors
    Zhihan Zhang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    X-SVAMP

    🤗 Paper | 📖 arXiv

      Dataset Description
    

    X-SVAMP is an evaluation benchmark for multilingual large language models (LLMs), including questions and answers in 5 languages (English, Chinese, Korean, Italian and Spanish). It is intended to evaluate the math reasoning abilities of LLMs. The dataset is translated by GPT-4-turbo from the original English-version SVAMP. In our paper, we evaluate LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with… See the full description on the dataset page: https://huggingface.co/datasets/zhihz0535/X-SVAMP_en_zh_ko_it_es.

  20. h

    alpaca

    • huggingface.co
    • opendatalab.com
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2023
    Dataset authored and provided by
    Tatsu Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca

      Dataset Summary
    

    Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

    The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
tokyotech-llm (2025). swallow-math [Dataset]. https://huggingface.co/datasets/tokyotech-llm/swallow-math

swallow-math

swallowmath

tokyotech-llm/swallow-math

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 7, 2025
Dataset authored and provided by
tokyotech-llm
License

https://choosealicense.com/licenses/llama3.3/https://choosealicense.com/licenses/llama3.3/

Description

SwallowMath

  Resources

🐙 GitHub: Explore the project repository, including pipeline code and prompts at rioyokotalab/swallow-code-math. 📑 arXiv: Read our paper for detailed methodology and results at arXiv:2505.02881. 🤗 Sister Dataset: Discover SwallowCode, our companion dataset for code generation.

  What is it?

SwallowMath is a high-quality mathematical dataset comprising approximately 2.3 billion tokens derived from the FineMath-4+ dataset through an… See the full description on the dataset page: https://huggingface.co/datasets/tokyotech-llm/swallow-math.

Search
Clear search
Close search
Google apps
Main menu