Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.
Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
Math-Vision (Math-V) dataset is a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.
Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on Math-Vision, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card
This dataset contains ~200K grade school math word problems. All the answers in this dataset is generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the potential of SLMs in Grade School Math for details about the dataset construction.
Dataset Sources
Repository: microsoft/orca-math-word-problems-200k Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math
Direct Use
This dataset has been designed to… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k.
MATH dataset
The repo contains MATH dataset. I have combined all problems into a single json file.
Copyright
These files are derived from source code of the MATH dataset, the copyright notice is reproduced in full below. MIT License
Copyright (c) 2021 Dan Hendrycks
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including… See the full description on the dataset page: https://huggingface.co/datasets/fdyrd/MATH.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Math is a dataset for object detection tasks - it contains N9gga annotations for 3,966 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. The same 250 problems from GSM8K are each translated via human annotators in 10 languages. GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
As of March 2024, OpenAI o1 was the large language model (LLM) tool that had the best benchmark score in solving math problems, with a score of 94.8 percent. Close behind, in second place, was OpenAI o1-mini, followed by GPT-4o.
A 10k-sample subset of OpenWebMath, focused on high-quality mathematical text.
Mathematics database.
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).
Example usage: train_examples, val_examples = datasets.load_dataset( 'math_dataset/arithmetic_mul', split=['train', 'test'], as_supervised=True)
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, The Global Math Training market size will grow at a compound annual growth rate (CAGR) of 8.20% from 2023 to 2030.
The demand for math training marketis rising due to thegrowing focus on STEM education, advancements in technology, globalization, and the rise of competitive examinations.
Demand for online remains higher in the math training market.
The Age 7-15category held the highest math training market revenue share in 2023.
North America will continue to lead, whereas the Asia Pacific math training marketwill experience the strongest growth until 2030.
Technological Advancements and Online Learning to Provide Viable Market Output
The math training market is the rapid advancement of technology and the widespread adoption of online learning platforms. The availability of interactive and engaging online math courses, along with the convenience of learning from home, has made math training more accessible and appealing to a broader audience. These platforms offer features such as personalized learning paths, gamification, and real-time feedback, enhancing the learning experience. Moreover, the COVID-19 pandemic accelerated the shift toward online education, making online math training a necessity for many students.
In January 2022, zSpace, an edtech company located in the United States, unveiled a novel AR/VR educational device. This cutting-edge technology aims to captivate students by immersing them in a virtual world filled with multidimensional content, all without requiring the use of glasses. The device is particularly beneficial for hybrid or remote learning scenarios.
The flexibility and scalability of online math training solutions make them attractive to both traditional students and working professionals seeking to improve their math skills. As technology continues to evolve, incorporating artificial intelligence and adaptive learning, the Math Training Market is poised to expand further, catering to diverse learning needs. Online math training solutions offer numerous benefits beyond flexibility and scalability. They provide personalized learning experiences through artificial intelligence and adaptive learning algorithms, allowing students to learn at their own pace and focus on areas where they need improvement.
Increasing Emphasis on STEM Education to Propel Market Growth
The growth of the math training market is the increasing emphasis on STEM (Science, Technology, Engineering, and Mathematics) education. In today's technology-driven world, STEM skills, especially strong mathematical abilities, are in high demand. Many educational institutions and governments are recognizing the importance of preparing students for careers in STEM fields. Consequently, math training programs are becoming essential to help students develop strong foundational math skills and advanced mathematical knowledge. The rising interest in coding, data science, and artificial intelligence has further amplified the need for math training as these fields heavily rely on mathematical concepts.
Market Dynamics of Math Training
Limited Access to Quality Education to Hinder Market Growth
The math training market has limited access to quality education, particularly in underserved and remote areas. While online learning has expanded access to math training, there are still regions and communities that need more internet connectivity and technology infrastructure. This digital divide creates a barrier for many students, preventing them from benefiting from online math training programs. Additionally, the quality of math education can vary widely between different regions and educational institutions, leading to disparities in math skills and knowledge.
Impact of COVID–19 on the Math Training Market
The COVID-19 pandemic significantly impacted the math training market. With lockdowns, school closures, and social distancing measures in place, traditional classroom-based math training faced disruptions. However, the pandemic also accelerated the adoption of online and remote learning solutions. Many math training providers quickly pivoted to offer virtual classes, webinars, and interactive online platforms to cater to the growing demand for distance education. Homeschooling and the need for supplemental education further boosted enrollment in online math training programs. Additionally, the pandemic highlig...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a compiled version of two benchmark math dataframes for solving math problems using LLMs, namely: - MATH: "MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations." - GSM8K: "a dataset of 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. The dataset is segmented into 7.5K training problems and 1K test problems. These problems take between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. A bright middle school student should be able to solve every problem. It can be used for multi-step mathematical reasoning."
The dataset consists of 21k math problems with its corresponding solutions.
problem
: text with the mathematical problem statement.level
: level of difficulty (GSM8K does not provide this column).type
: math field (GSM8K does not provide this column).solution
: text with the mathematical problem solution.stage
: either "train" or "test". This corresponds to the original dataframe split.source
: either "MATH" or "GSM8K". Source of the problem.This report includes results for the New York State Math exams for the years 2013-2023. For the results for the New York State Math exams for the years 2006-2012, please follow this link.
Third grade English Language Arts (ELA) and Math test results for the 2016-2017 school year for the state of Michigan. Data Driven Detroit obtained these datasets from MI School Data, for the State of the Detroit Child tool in July 2017. Test results were originally obtained on a school level and aggregated to state by Data Driven Detroit. Student data was suppressed when less than five students were tested per school.Click here for metadata (descriptions of the fields).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2011 to 2022 for Academy Of Math And Science vs. Arizona and Academy Of Mathematics And Science Inc. (79961) School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual distribution of students across grade levels in Neal Math Science Academy
The Early Grade Math Pilot program in Ghana, implemented under the U.S. Agency for International Development (USAID) Partnership for Education Learning Activity, was evaluated with an independent randomized controlled trial between 2017 and 2018. This data asset contains the two waves of data collected in Ghana during this time period. This dataset contains survey head teacher data from the activity endline.
The percentage of 4th grade Iowa students tested who met standard math score metric associated with the grade and content.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual total students amount from 2009 to 2023 for Triad Math And Science Academy
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Provide:
a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
Dataset Card for Big-Math-RL-Verified-Processed
This is a processed version of SynthLabsAI/Big-Math-RL-Verified where we have applied the following filters:
Removed samples where llama8b_solve_rate is None Removed samples that could not be parsed by math-verify (empty lists)
We have also created 5 additional subsets to indicate difficulty level, similar to the MATH dataset. To do so, we computed quintiles on the llama8b_solve_rate values and then filtered the dataset into the… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.
Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.