100+ datasets found

Big-Math-RL-Verified
huggingface.co
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
Explore at:
Dataset updated
Feb 21, 2025
Dataset provided by
Synth Labs
Authors
SynthLabs
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
P
MATH-V Dataset
paperswithcode.com
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Mingjie Zhan; Hongsheng Li (2024). MATH-V Dataset [Dataset]. https://paperswithcode.com/dataset/math-v
Explore at:
Dataset updated
Sep 3, 2024
Authors
Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Mingjie Zhan; Hongsheng Li
Description
Math-Vision (Math-V) dataset is a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on Math-Vision, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development.
orca-math-word-problems-200k
huggingface.co
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2024). orca-math-word-problems-200k [Dataset]. https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card

This dataset contains ~200K grade school math word problems. All the answers in this dataset is generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the potential of SLMs in Grade School Math for details about the dataset construction.

Dataset Sources

Repository: microsoft/orca-math-word-problems-200k Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math

Direct Use

This dataset has been designed to… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k.
h
MATH
huggingface.co
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rundong Yang (2024). MATH [Dataset]. https://huggingface.co/datasets/fdyrd/MATH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2024
Authors
Rundong Yang
Description
MATH dataset

The repo contains MATH dataset. I have combined all problems into a single json file.

Copyright

These files are derived from source code of the MATH dataset, the copyright notice is reproduced in full below. MIT License

Copyright (c) 2021 Dan Hendrycks

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including… See the full description on the dataset page: https://huggingface.co/datasets/fdyrd/MATH.
R
Math Dataset
universe.roboflow.com
zip
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
a (2025). Math Dataset [Dataset]. https://universe.roboflow.com/a-wlidu/math-vaijn
Explore at:
zipAvailable download formats
Dataset updated
Jun 2, 2025
Dataset authored and provided by
a
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
N9gga Bounding Boxes
Description
Math

## Overview Math is a dataset for object detection tasks - it contains N9gga annotations for 3,966 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
P
Data from: MGSM Dataset
paperswithcode.com
Updated Aug 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Freda Shi; Mirac Suzgun; Markus Freitag; Xuezhi Wang; Suraj Srivats; Soroush Vosoughi; Hyung Won Chung; Yi Tay; Sebastian Ruder; Denny Zhou; Dipanjan Das; Jason Wei (2023). MGSM Dataset [Dataset]. https://paperswithcode.com/dataset/mgsm
Explore at:
Dataset updated
Aug 20, 2023
Authors
Freda Shi; Mirac Suzgun; Markus Freitag; Xuezhi Wang; Suraj Srivats; Soroush Vosoughi; Hyung Won Chung; Yi Tay; Sebastian Ruder; Denny Zhou; Dipanjan Das; Jason Wei
Description
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. The same 250 problems from GSM8K are each translated via human annotators in 10 languages. GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
Ranking of LLM tools in solving math problems 2024
statista.com
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Ranking of LLM tools in solving math problems 2024 [Dataset]. https://www.statista.com/statistics/1458141/leading-math-llm-tools/
Explore at:
Dataset updated
Oct 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2024
Area covered
Worldwide
Description
As of March 2024, OpenAI o1 was the large language model (LLM) tool that had the best benchmark score in solving math problems, with a score of 94.8 percent. Close behind, in second place, was OpenAI o1-mini, followed by GPT-4o.
h
small-open-web-math-dataset
huggingface.co
Updated Nov 6, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brando Miranda (2011). small-open-web-math-dataset [Dataset]. https://huggingface.co/datasets/brando/small-open-web-math-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2011
Authors
Brando Miranda
Description
Small Open Web Math Dataset

A 10k-sample subset of OpenWebMath, focused on high-quality mathematical text.
math_dataset
huggingface.co
tensorflow.org
Updated Jun 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepmind (2020). math_dataset [Dataset]. https://huggingface.co/datasets/deepmind/math_dataset
Explore at:
Dataset updated
Jun 12, 2020
Dataset provided by
DeepMindhttp://deepmind.com/
Authors
Deepmind
Description
Mathematics database.

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

Example usage: train_examples, val_examples = datasets.load_dataset( 'math_dataset/arithmetic_mul', split=['train', 'test'], as_supervised=True)
c
Math Training Market is Growing at Compound Annual Growth Rate (CAGR) of...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). Math Training Market is Growing at Compound Annual Growth Rate (CAGR) of 8.20% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/math-training-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, The Global Math Training market size will grow at a compound annual growth rate (CAGR) of 8.20% from 2023 to 2030.

The demand for math training marketis rising due to thegrowing focus on STEM education, advancements in technology, globalization, and the rise of competitive examinations. Demand for online remains higher in the math training market. The Age 7-15category held the highest math training market revenue share in 2023. North America will continue to lead, whereas the Asia Pacific math training marketwill experience the strongest growth until 2030.

Technological Advancements and Online Learning to Provide Viable Market Output

The math training market is the rapid advancement of technology and the widespread adoption of online learning platforms. The availability of interactive and engaging online math courses, along with the convenience of learning from home, has made math training more accessible and appealing to a broader audience. These platforms offer features such as personalized learning paths, gamification, and real-time feedback, enhancing the learning experience. Moreover, the COVID-19 pandemic accelerated the shift toward online education, making online math training a necessity for many students.

In January 2022, zSpace, an edtech company located in the United States, unveiled a novel AR/VR educational device. This cutting-edge technology aims to captivate students by immersing them in a virtual world filled with multidimensional content, all without requiring the use of glasses. The device is particularly beneficial for hybrid or remote learning scenarios.

The flexibility and scalability of online math training solutions make them attractive to both traditional students and working professionals seeking to improve their math skills. As technology continues to evolve, incorporating artificial intelligence and adaptive learning, the Math Training Market is poised to expand further, catering to diverse learning needs. Online math training solutions offer numerous benefits beyond flexibility and scalability. They provide personalized learning experiences through artificial intelligence and adaptive learning algorithms, allowing students to learn at their own pace and focus on areas where they need improvement.

Increasing Emphasis on STEM Education to Propel Market Growth

The growth of the math training market is the increasing emphasis on STEM (Science, Technology, Engineering, and Mathematics) education. In today's technology-driven world, STEM skills, especially strong mathematical abilities, are in high demand. Many educational institutions and governments are recognizing the importance of preparing students for careers in STEM fields. Consequently, math training programs are becoming essential to help students develop strong foundational math skills and advanced mathematical knowledge. The rising interest in coding, data science, and artificial intelligence has further amplified the need for math training as these fields heavily rely on mathematical concepts.

Market Dynamics of Math Training

Limited Access to Quality Education to Hinder Market Growth

The math training market has limited access to quality education, particularly in underserved and remote areas. While online learning has expanded access to math training, there are still regions and communities that need more internet connectivity and technology infrastructure. This digital divide creates a barrier for many students, preventing them from benefiting from online math training programs. Additionally, the quality of math education can vary widely between different regions and educational institutions, leading to disparities in math skills and knowledge.

Impact of COVID–19 on the Math Training Market

The COVID-19 pandemic significantly impacted the math training market. With lockdowns, school closures, and social distancing measures in place, traditional classroom-based math training faced disruptions. However, the pandemic also accelerated the adoption of online and remote learning solutions. Many math training providers quickly pivoted to offer virtual classes, webinars, and interactive online platforms to cater to the growing demand for distance education. Homeschooling and the need for supplemental education further boosted enrollment in online math training programs. Additionally, the pandemic highlig...
AIMO External Dataset
kaggle.com
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
moth (2024). AIMO External Dataset [Dataset]. https://www.kaggle.com/datasets/alejopaullier/aimo-external-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
moth
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

This dataset is a compiled version of two benchmark math dataframes for solving math problems using LLMs, namely: - MATH: "MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations." - GSM8K: "a dataset of 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. The dataset is segmented into 7.5K training problems and 1K test problems. These problems take between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. A bright middle school student should be able to solve every problem. It can be used for multi-step mathematical reasoning."

The dataset consists of 21k math problems with its corresponding solutions.

Columns

problem: text with the mathematical problem statement.

level: level of difficulty (GSM8K does not provide this column).

type: math field (GSM8K does not provide this column).

solution: text with the mathematical problem solution.

stage: either "train" or "test". This corresponds to the original dataframe split.

source: either "MATH" or "GSM8K". Source of the problem.
d
Math Test Results 2013-2023
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Math Test Results 2013-2023 [Dataset]. https://catalog.data.gov/dataset/math-test-results-2013-2023
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
data.cityofnewyork.us
Description
This report includes results for the New York State Math exams for the years 2013-2023. For the results for the New York State Math exams for the years 2006-2012, please follow this link.
d
ThirdGrade ELA Math Scores Michigan 08032017
catalog.data.gov
detroitdata.org
+5more
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Driven Detroit (2024). ThirdGrade ELA Math Scores Michigan 08032017 [Dataset]. https://catalog.data.gov/dataset/thirdgrade-ela-math-scores-michigan-08032017-922cf
Explore at:
Dataset updated
Sep 21, 2024
Dataset provided by
Data Driven Detroit
Area covered
Michigan
Description
Third grade English Language Arts (ELA) and Math test results for the 2016-2017 school year for the state of Michigan. Data Driven Detroit obtained these datasets from MI School Data, for the State of the Detroit Child tool in July 2017. Test results were originally obtained on a school level and aggregated to state by Data Driven Detroit. Student data was suppressed when less than five students were tested per school.Click here for metadata (descriptions of the fields).
p
Trends in Math Proficiency (2011-2022): Academy Of Math And Science vs....
publicschoolreview.com
Updated Sep 1, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2011). Trends in Math Proficiency (2011-2022): Academy Of Math And Science vs. Arizona vs. Academy Of Mathematics And Science Inc. (79961) School District [Dataset]. https://www.publicschoolreview.com/academy-of-math-and-science-profile
Explore at:
Dataset updated
Sep 1, 2011
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset tracks annual math proficiency from 2011 to 2022 for Academy Of Math And Science vs. Arizona and Academy Of Mathematics And Science Inc. (79961) School District
p
Distribution of Students Across Grade Levels in Neal Math Science Academy
publicschoolreview.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review, Distribution of Students Across Grade Levels in Neal Math Science Academy [Dataset]. https://www.publicschoolreview.com/neal-math-science-academy-profile
Explore at:
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset tracks annual distribution of students across grade levels in Neal Math Science Academy
Ghana Early Grade Math Pilot Impact Evaluation: Endline Head Teacher Dataset...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Ghana Early Grade Math Pilot Impact Evaluation: Endline Head Teacher Dataset [Dataset]. https://catalog.data.gov/dataset/ghana-early-grade-math-pilot-impact-evaluation-endline-head-teacher-dataset
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttps://usaid.gov/
Area covered
Ghana
Description
The Early Grade Math Pilot program in Ghana, implemented under the U.S. Agency for International Development (USAID) Partnership for Education Learning Activity, was evaluated with an independent randomized controlled trial between 2017 and 2018. This data asset contains the two waves of data collected in Ghana during this time period. This dataset contains survey head teacher data from the activity endline.
d
4th Grade Math Proficiency Rate
catalog.data.gov
s.cnmilf.com
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.iowa.gov (2023). 4th Grade Math Proficiency Rate [Dataset]. https://catalog.data.gov/dataset/4th-grade-math-proficiency-rate
Explore at:
Dataset updated
Sep 1, 2023
Dataset provided by
data.iowa.gov
Description
The percentage of 4th grade Iowa students tested who met standard math score metric associated with the grade and content.
p
Trends in Total Students (2009-2023): Triad Math And Science Academy
publicschoolreview.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review, Trends in Total Students (2009-2023): Triad Math And Science Academy [Dataset]. https://www.publicschoolreview.com/triad-math-and-science-academy-profile
Explore at:
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset tracks annual total students amount from 2009 to 2023 for Triad Math And Science Academy
P
ViMATH Dataset
paperswithcode.com
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sang T. Truong; Duc Q. Nguyen; Toan Nguyen; Dong D. Le; Nhi N. Truong; Tho Quan; Sanmi Koyejo (2024). ViMATH Dataset [Dataset]. https://paperswithcode.com/dataset/vimath
Explore at:
Dataset updated
Mar 4, 2024
Authors
Sang T. Truong; Duc Q. Nguyen; Toan Nguyen; Dong D. Le; Nhi N. Truong; Tho Quan; Sanmi Koyejo
Description
Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Provide:

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
h
Big-Math-RL-Verified-Processed
huggingface.co
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). Big-Math-RL-Verified-Processed [Dataset]. https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed
Explore at:
Dataset updated
Apr 27, 2025
Dataset authored and provided by
Open R1
Description
Dataset Card for Big-Math-RL-Verified-Processed

This is a processed version of SynthLabsAI/Big-Math-RL-Verified where we have applied the following filters:

Removed samples where llama8b_solve_rate is None Removed samples that could not be parsed by math-verify (empty lists)

We have also created 5 additional subsets to indicate difficulty level, similar to the MATH dataset. To do so, we computed quintiles on the llama8b_solve_rate values and then filtered the dataset into the… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed.

Facebook

Twitter

Click to copy link

Link copied

Cite

SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified

Big-Math-RL-Verified

SynthLabsAI/Big-Math-RL-Verified

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 21, 2025

Dataset provided by

Synth Labs

Authors

SynthLabs

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.

Clear search

Close search

Google apps

Main menu

Big-Math-RL-Verified

MATH-V Dataset

orca-math-word-problems-200k

MATH

Math Dataset

Math

Data from: MGSM Dataset

Ranking of LLM tools in solving math problems 2024

small-open-web-math-dataset

Small Open Web Math Dataset

math_dataset

Math Training Market is Growing at Compound Annual Growth Rate (CAGR) of...

AIMO External Dataset

Description

Columns

Math Test Results 2013-2023

ThirdGrade ELA Math Scores Michigan 08032017

Trends in Math Proficiency (2011-2022): Academy Of Math And Science vs....

Distribution of Students Across Grade Levels in Neal Math Science Academy

Ghana Early Grade Math Pilot Impact Evaluation: Endline Head Teacher Dataset...

4th Grade Math Proficiency Rate

Trends in Total Students (2009-2023): Triad Math And Science Academy

ViMATH Dataset

Big-Math-RL-Verified-Processed

Big-Math-RL-Verified

SynthLabsAI/Big-Math-RL-Verified