Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society
Github: https://github.com/lightaime/camel Website: https://www.camel-ai.org/ Arxiv Paper: https://arxiv.org/abs/2303.17760
Dataset Summary
Math dataset is composed of 50K problem-solution pairs obtained using GPT-4. The dataset problem-solutions pairs generating from 25 math topics, 25 subtopics for each topic and 80 problems for each "topic,subtopic" pairs. We provide the data… See the full description on the dataset page: https://huggingface.co/datasets/camel-ai/math.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
OpenR1-Math-220k
Dataset description
OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary
MATH dataset from https://github.com/hendrycks/math
Citation Information
@article{hendrycksmath2021, title={Measuring Mathematical Problem Solving With the MATH Dataset}, author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt}, journal={NeurIPS}, year={2021} }
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Omni-MATH
Recent advancements in AI, particularly in large language models (LLMs), have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To mitigate this limitation, we propose a comprehensive and challenging benchmark specifically designed… See the full description on the dataset page: https://huggingface.co/datasets/KbsdJames/Omni-MATH.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
Facebook
Twittermath-ai/minervamath dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterHendrycks MATH Dataset
Dataset Description
The MATH dataset is a collection of mathematics competition problems designed to evaluate mathematical reasoning and problem-solving capabilities in computational systems. Containing 12,500 high school competition-level mathematics problems, this dataset is notable for including detailed step-by-step solutions alongside each problem.
Dataset Summary
The dataset consists of mathematics problems spanning multiple… See the full description on the dataset page: https://huggingface.co/datasets/nlile/hendrycks-MATH-benchmark.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
StackMathQA
StackMathQA: A Curated Collection of 2 Million Mathematical Questions and Answers Sourced from Stack Exchange
StackMathQA is a meticulously curated collection of 2 million mathematical questions and answers, sourced from various Stack Exchange sites. This repository is designed to serve as a comprehensive resource for researchers, educators, and enthusiasts in the field of mathematics and AI research.
Configs
configs: - config_name: stackmathqa1600k… See the full description on the dataset page: https://huggingface.co/datasets/math-ai/StackMathQA.
Facebook
TwitterKeiran Paster*, Marco Dos Santos*, Zhangir Azerbayev, Jimmy Ba GitHub | ArXiv | PDF OpenWebMath is a dataset containing the majority of the high-quality, mathematical text from the internet. It is filtered and extracted from over 200B HTML files on Common Crawl down to a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is intended for use in pretraining and finetuninglarge language models. You can download the dataset using Hugging Face: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/open-web-math/open-web-math.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card
This dataset contains ~200K grade school math word problems. All the answers in this dataset is generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the potential of SLMs in Grade School Math for details about the dataset construction.
Dataset Sources
Repository: microsoft/orca-math-word-problems-200k Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math
Direct Use
This dataset has been designed to… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Mathematics Aptitude Test of Heuristics, hard subset (MATH-Hard) dataset
Dataset Summary
The Mathematics Aptitude Test of Heuristics (MATH) dataset consists of problems from mathematics competitions, including the AMC 10, AMC 12, AIME, and more. Each problem in MATH has a full step-by-step solution, which can be used to teach models to generate answer derivations and explanations. For MATH-Hard, only the hardest questions were kept (Level 5).… See the full description on the dataset page: https://huggingface.co/datasets/lighteval/MATH-Hard.
Facebook
TwitterMATH dataset
The repo contains MATH dataset. I have combined all problems into a single json file.
Copyright
These files are derived from source code of the MATH dataset, the copyright notice is reproduced in full below. MIT License
Copyright (c) 2021 Dan Hendrycks
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including… See the full description on the dataset page: https://huggingface.co/datasets/fdyrd/MATH.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
svc-huggingface/minerva-math dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.
Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MM_Math Datasets
We introduce our multimodal mathematics dataset, MM-MATH,. This dataset is collected from real middle school exams in China, and all the math problems are open-ended to evaluate the mathematical problem-solving abilities of current multimodal models. MM-MATH is annotated with fine-grained three-dimensional labels: difficulty, grade, and knowledge points. The difficulty level is determined based on the average scores of student exams, the grade labels are derived… See the full description on the dataset page: https://huggingface.co/datasets/THU-KEG/MM_Math.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
OpenR1-Math-Raw
Dataset description
OpenR1-Math-Raw is a large-scale dataset for mathematical reasoning. It consists of 516k math problems sourced from AI-MO/NuminaMath-1.5 with 1 to 8 reasoning traces generated by DeepSeek R1. The traces were verified using Math Verify and LLM-as-Judge based verifier (Llama-3.3-70B-Instruct) The dataset contains:
516,499 problems 1,209,403 R1-generated solutions, with 2.3 solutions per problem on average re-parsed answers… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
MathInstruct is a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields. Project Page:… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MathInstruct.
Facebook
Twitterankner/math-500 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittermath-ai/amc23 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
μ-MATH (Meta U-MATH) is a meta-evaluation dataset derived from the U-MATH benchmark. It is intended to assess the ability of LLMs to judge free-form mathematical solutions. The dataset includes 1,084 labeled samples generated from 271 U-MATH tasks, covering problems of varying assessment complexity. For fine-grained performance evaluation results, in-depth analyses and detailed discussions on behaviors and biases of LLM judges, check out our paper.
📊 U-MATH benchmark at Huggingface 🔎 μ-MATH… See the full description on the dataset page: https://huggingface.co/datasets/toloka/mu-math.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society
Github: https://github.com/lightaime/camel Website: https://www.camel-ai.org/ Arxiv Paper: https://arxiv.org/abs/2303.17760
Dataset Summary
Math dataset is composed of 50K problem-solution pairs obtained using GPT-4. The dataset problem-solutions pairs generating from 25 math topics, 25 subtopics for each topic and 80 problems for each "topic,subtopic" pairs. We provide the data… See the full description on the dataset page: https://huggingface.co/datasets/camel-ai/math.