100+ datasets found
  1. Big-Math-RL-Verified

    • huggingface.co
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Synth Labs
    Authors
    SynthLabs
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

    Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

    Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
    
  2. P

    MATH-V Dataset

    • paperswithcode.com
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Mingjie Zhan; Hongsheng Li (2024). MATH-V Dataset [Dataset]. https://paperswithcode.com/dataset/math-v
    Explore at:
    Dataset updated
    Sep 3, 2024
    Authors
    Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Mingjie Zhan; Hongsheng Li
    Description

    Math-Vision (Math-V) dataset is a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

    Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on Math-Vision, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development.

  3. orca-math-word-problems-200k

    • huggingface.co
    Updated Mar 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2024). orca-math-word-problems-200k [Dataset]. https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card

    This dataset contains ~200K grade school math word problems. All the answers in this dataset is generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the potential of SLMs in Grade School Math for details about the dataset construction.

      Dataset Sources
    

    Repository: microsoft/orca-math-word-problems-200k Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math

      Direct Use
    

    This dataset has been designed to… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k.

  4. h

    MATH

    • huggingface.co
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rundong Yang (2024). MATH [Dataset]. https://huggingface.co/datasets/fdyrd/MATH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2024
    Authors
    Rundong Yang
    Description

    MATH dataset

    The repo contains MATH dataset. I have combined all problems into a single json file.

      Copyright
    

    These files are derived from source code of the MATH dataset, the copyright notice is reproduced in full below. MIT License

    Copyright (c) 2021 Dan Hendrycks

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including… See the full description on the dataset page: https://huggingface.co/datasets/fdyrd/MATH.

  5. R

    Math Dataset

    • universe.roboflow.com
    zip
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    a (2025). Math Dataset [Dataset]. https://universe.roboflow.com/a-wlidu/math-vaijn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    a
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    N9gga Bounding Boxes
    Description

    Math

    ## Overview
    
    Math is a dataset for object detection tasks - it contains N9gga annotations for 3,966 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. P

    Data from: MGSM Dataset

    • paperswithcode.com
    Updated Aug 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Freda Shi; Mirac Suzgun; Markus Freitag; Xuezhi Wang; Suraj Srivats; Soroush Vosoughi; Hyung Won Chung; Yi Tay; Sebastian Ruder; Denny Zhou; Dipanjan Das; Jason Wei (2023). MGSM Dataset [Dataset]. https://paperswithcode.com/dataset/mgsm
    Explore at:
    Dataset updated
    Aug 20, 2023
    Authors
    Freda Shi; Mirac Suzgun; Markus Freitag; Xuezhi Wang; Suraj Srivats; Soroush Vosoughi; Hyung Won Chung; Yi Tay; Sebastian Ruder; Denny Zhou; Dipanjan Das; Jason Wei
    Description

    Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. The same 250 problems from GSM8K are each translated via human annotators in 10 languages. GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

  7. Ranking of LLM tools in solving math problems 2024

    • statista.com
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Ranking of LLM tools in solving math problems 2024 [Dataset]. https://www.statista.com/statistics/1458141/leading-math-llm-tools/
    Explore at:
    Dataset updated
    Oct 25, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2024
    Area covered
    Worldwide
    Description

    As of March 2024, OpenAI o1 was the large language model (LLM) tool that had the best benchmark score in solving math problems, with a score of 94.8 percent. Close behind, in second place, was OpenAI o1-mini, followed by GPT-4o.

  8. h

    small-open-web-math-dataset

    • huggingface.co
    Updated Nov 6, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brando Miranda (2011). small-open-web-math-dataset [Dataset]. https://huggingface.co/datasets/brando/small-open-web-math-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2011
    Authors
    Brando Miranda
    Description

    Small Open Web Math Dataset

    A 10k-sample subset of OpenWebMath, focused on high-quality mathematical text.

  9. math_dataset

    • huggingface.co
    • tensorflow.org
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2020). math_dataset [Dataset]. https://huggingface.co/datasets/deepmind/math_dataset
    Explore at:
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    Description

    Mathematics database.

    This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

    Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

    Example usage: train_examples, val_examples = datasets.load_dataset( 'math_dataset/arithmetic_mul', split=['train', 'test'], as_supervised=True)

  10. c

    Math Training Market is Growing at Compound Annual Growth Rate (CAGR) of...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Math Training Market is Growing at Compound Annual Growth Rate (CAGR) of 8.20% from 2023 to 2030. [Dataset]. https://www.cognitivemarketresearch.com/math-training-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, The Global Math Training market size will grow at a compound annual growth rate (CAGR) of 8.20% from 2023 to 2030.

    The demand for math training marketis rising due to thegrowing focus on STEM education, advancements in technology, globalization, and the rise of competitive examinations.
    Demand for online remains higher in the math training market.
    The Age 7-15category held the highest math training market revenue share in 2023.
    North America will continue to lead, whereas the Asia Pacific math training marketwill experience the strongest growth until 2030.
    

    Technological Advancements and Online Learning to Provide Viable Market Output

    The math training market is the rapid advancement of technology and the widespread adoption of online learning platforms. The availability of interactive and engaging online math courses, along with the convenience of learning from home, has made math training more accessible and appealing to a broader audience. These platforms offer features such as personalized learning paths, gamification, and real-time feedback, enhancing the learning experience. Moreover, the COVID-19 pandemic accelerated the shift toward online education, making online math training a necessity for many students.

    In January 2022, zSpace, an edtech company located in the United States, unveiled a novel AR/VR educational device. This cutting-edge technology aims to captivate students by immersing them in a virtual world filled with multidimensional content, all without requiring the use of glasses. The device is particularly beneficial for hybrid or remote learning scenarios.

    The flexibility and scalability of online math training solutions make them attractive to both traditional students and working professionals seeking to improve their math skills. As technology continues to evolve, incorporating artificial intelligence and adaptive learning, the Math Training Market is poised to expand further, catering to diverse learning needs. Online math training solutions offer numerous benefits beyond flexibility and scalability. They provide personalized learning experiences through artificial intelligence and adaptive learning algorithms, allowing students to learn at their own pace and focus on areas where they need improvement.

    Increasing Emphasis on STEM Education to Propel Market Growth
    

    The growth of the math training market is the increasing emphasis on STEM (Science, Technology, Engineering, and Mathematics) education. In today's technology-driven world, STEM skills, especially strong mathematical abilities, are in high demand. Many educational institutions and governments are recognizing the importance of preparing students for careers in STEM fields. Consequently, math training programs are becoming essential to help students develop strong foundational math skills and advanced mathematical knowledge. The rising interest in coding, data science, and artificial intelligence has further amplified the need for math training as these fields heavily rely on mathematical concepts.

    Market Dynamics of Math Training

    Limited Access to Quality Education to Hinder Market Growth
    

    The math training market has limited access to quality education, particularly in underserved and remote areas. While online learning has expanded access to math training, there are still regions and communities that need more internet connectivity and technology infrastructure. This digital divide creates a barrier for many students, preventing them from benefiting from online math training programs. Additionally, the quality of math education can vary widely between different regions and educational institutions, leading to disparities in math skills and knowledge.

    Impact of COVID–19 on the Math Training Market

    The COVID-19 pandemic significantly impacted the math training market. With lockdowns, school closures, and social distancing measures in place, traditional classroom-based math training faced disruptions. However, the pandemic also accelerated the adoption of online and remote learning solutions. Many math training providers quickly pivoted to offer virtual classes, webinars, and interactive online platforms to cater to the growing demand for distance education. Homeschooling and the need for supplemental education further boosted enrollment in online math training programs. Additionally, the pandemic highlig...

  11. AIMO External Dataset

    • kaggle.com
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moth (2024). AIMO External Dataset [Dataset]. https://www.kaggle.com/datasets/alejopaullier/aimo-external-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    moth
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description

    This dataset is a compiled version of two benchmark math dataframes for solving math problems using LLMs, namely: - MATH: "MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations." - GSM8K: "a dataset of 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. The dataset is segmented into 7.5K training problems and 1K test problems. These problems take between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. A bright middle school student should be able to solve every problem. It can be used for multi-step mathematical reasoning."

    The dataset consists of 21k math problems with its corresponding solutions.

    Columns

    • problem: text with the mathematical problem statement.
    • level: level of difficulty (GSM8K does not provide this column).
    • type: math field (GSM8K does not provide this column).
    • solution: text with the mathematical problem solution.
    • stage: either "train" or "test". This corresponds to the original dataframe split.
    • source: either "MATH" or "GSM8K". Source of the problem.
  12. d

    Math Test Results 2013-2023

    • catalog.data.gov
    • data.cityofnewyork.us
    • +1more
    Updated Nov 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). Math Test Results 2013-2023 [Dataset]. https://catalog.data.gov/dataset/math-test-results-2013-2023
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    This report includes results for the New York State Math exams for the years 2013-2023. For the results for the New York State Math exams for the years 2006-2012, please follow this link.

  13. d

    ThirdGrade ELA Math Scores Michigan 08032017

    • catalog.data.gov
    • detroitdata.org
    • +5more
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Driven Detroit (2024). ThirdGrade ELA Math Scores Michigan 08032017 [Dataset]. https://catalog.data.gov/dataset/thirdgrade-ela-math-scores-michigan-08032017-922cf
    Explore at:
    Dataset updated
    Sep 21, 2024
    Dataset provided by
    Data Driven Detroit
    Area covered
    Michigan
    Description

    Third grade English Language Arts (ELA) and Math test results for the 2016-2017 school year for the state of Michigan. Data Driven Detroit obtained these datasets from MI School Data, for the State of the Detroit Child tool in July 2017. Test results were originally obtained on a school level and aggregated to state by Data Driven Detroit. Student data was suppressed when less than five students were tested per school.Click here for metadata (descriptions of the fields).

  14. p

    Trends in Math Proficiency (2011-2022): Academy Of Math And Science vs....

    • publicschoolreview.com
    Updated Sep 1, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2011). Trends in Math Proficiency (2011-2022): Academy Of Math And Science vs. Arizona vs. Academy Of Mathematics And Science Inc. (79961) School District [Dataset]. https://www.publicschoolreview.com/academy-of-math-and-science-profile
    Explore at:
    Dataset updated
    Sep 1, 2011
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual math proficiency from 2011 to 2022 for Academy Of Math And Science vs. Arizona and Academy Of Mathematics And Science Inc. (79961) School District

  15. p

    Distribution of Students Across Grade Levels in Neal Math Science Academy

    • publicschoolreview.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Distribution of Students Across Grade Levels in Neal Math Science Academy [Dataset]. https://www.publicschoolreview.com/neal-math-science-academy-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual distribution of students across grade levels in Neal Math Science Academy

  16. Ghana Early Grade Math Pilot Impact Evaluation: Endline Head Teacher Dataset...

    • catalog.data.gov
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Ghana Early Grade Math Pilot Impact Evaluation: Endline Head Teacher Dataset [Dataset]. https://catalog.data.gov/dataset/ghana-early-grade-math-pilot-impact-evaluation-endline-head-teacher-dataset
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttps://usaid.gov/
    Area covered
    Ghana
    Description

    The Early Grade Math Pilot program in Ghana, implemented under the U.S. Agency for International Development (USAID) Partnership for Education Learning Activity, was evaluated with an independent randomized controlled trial between 2017 and 2018. This data asset contains the two waves of data collected in Ghana during this time period. This dataset contains survey head teacher data from the activity endline.

  17. d

    4th Grade Math Proficiency Rate

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.iowa.gov (2023). 4th Grade Math Proficiency Rate [Dataset]. https://catalog.data.gov/dataset/4th-grade-math-proficiency-rate
    Explore at:
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    data.iowa.gov
    Description

    The percentage of 4th grade Iowa students tested who met standard math score metric associated with the grade and content.

  18. p

    Trends in Total Students (2009-2023): Triad Math And Science Academy

    • publicschoolreview.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Trends in Total Students (2009-2023): Triad Math And Science Academy [Dataset]. https://www.publicschoolreview.com/triad-math-and-science-academy-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual total students amount from 2009 to 2023 for Triad Math And Science Academy

  19. P

    ViMATH Dataset

    • paperswithcode.com
    Updated Mar 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sang T. Truong; Duc Q. Nguyen; Toan Nguyen; Dong D. Le; Nhi N. Truong; Tho Quan; Sanmi Koyejo (2024). ViMATH Dataset [Dataset]. https://paperswithcode.com/dataset/vimath
    Explore at:
    Dataset updated
    Mar 4, 2024
    Authors
    Sang T. Truong; Duc Q. Nguyen; Toan Nguyen; Dong D. Le; Nhi N. Truong; Tho Quan; Sanmi Koyejo
    Description

    Click to add a brief description of the dataset (Markdown and LaTeX enabled).

    Provide:

    a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

  20. h

    Big-Math-RL-Verified-Processed

    • huggingface.co
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open R1 (2025). Big-Math-RL-Verified-Processed [Dataset]. https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed
    Explore at:
    Dataset updated
    Apr 27, 2025
    Dataset authored and provided by
    Open R1
    Description

    Dataset Card for Big-Math-RL-Verified-Processed

    This is a processed version of SynthLabsAI/Big-Math-RL-Verified where we have applied the following filters:

    Removed samples where llama8b_solve_rate is None Removed samples that could not be parsed by math-verify (empty lists)

    We have also created 5 additional subsets to indicate difficulty level, similar to the MATH dataset. To do so, we computed quintiles on the llama8b_solve_rate values and then filtered the dataset into the… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
Organization logo

Big-Math-RL-Verified

SynthLabsAI/Big-Math-RL-Verified

Explore at:
7 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 21, 2025
Dataset provided by
Synth Labs
Authors
SynthLabs
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
Search
Clear search
Close search
Google apps
Main menu