100+ datasets found

Ranking of LLM tools in solving math problems 2024
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Ranking of LLM tools in solving math problems 2024 [Dataset]. https://www.statista.com/statistics/1458141/leading-math-llm-tools/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2024
Area covered
Worldwide
Description
As of March 2024, OpenAI o1 was the large language model (LLM) tool that had the best benchmark score in solving math problems, with a score of **** percent. Close behind, in second place, was OpenAI o1-mini, followed by GPT-4o.
MathInstruct Dataset: Hybrid Math Instruction
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MathInstruct Dataset: Hybrid Math Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathinstruct-dataset-hybrid-math-instruction-tun
Explore at:
zip(60239940 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

By TIGER-Lab (From Huggingface) [source]

About this dataset

MathInstruct is a comprehensive and meticulously curated dataset specifically designed to facilitate the development and evaluation of models for math instruction tuning. This dataset consists of a total of 13 different math rationale datasets, out of which six have been exclusively curated for this project, ensuring a diverse range of instructional materials. The main objective behind creating this dataset is to provide researchers with an easily accessible and manageable resource that aids in enhancing the effectiveness and precision of math instruction.

One noteworthy feature of MathInstruct is its lightweight nature, making it highly convenient for researchers to utilize without any hassle. With carefully selected columns such as source, source, output, output, users can readily identify the origin or reference material from where the math instruction was obtained. Additionally, they can also refer to the expected output or solution corresponding to each specific math problem or exercise.

Overall, MathInstruct offers immense potential in refining hybrid math instruction by facilitating meticulous model development and rigorous evaluation processes. Researchers can leverage this diverse dataset to gain deeper insights into effective teaching methodologies while exploring innovative approaches towards enhancing mathematical learning experiences

How to use the dataset

Title: How to Use the MathInstruct Dataset for Hybrid Math Instruction Tuning

Introduction: The MathInstruct dataset is a comprehensive collection of math instruction examples, designed to assist in developing and evaluating models for math instruction tuning. This guide will provide an overview of the dataset and explain how to make effective use of it.

Understanding the Dataset Structure: The dataset consists of a file named train.csv. This CSV file contains the training data, which includes various columns such as source and output. The source column represents the source of math instruction (textbook, online resource, or teacher), while the output column represents expected output or solution to a particular math problem or exercise.

Accessing the Dataset: To access the MathInstruct dataset, you can download it from Kaggle's website. Once downloaded, you can read and manipulate the data using programming languages like Python with libraries such as pandas.

Exploring the Columns: a) Source Column: The source column provides information about where each math instruction comes from. It may include references to specific textbooks, online resources, or even teachers who provided instructional material. b) Output Column: The output column specifies what students are expected to achieve as a result of each math instruction. It contains solutions or expected outputs for different math problems or exercises.

Utilizing Source Information: By analyzing the different sources mentioned in this dataset, researchers can understand which instructional materials are more effective in teaching specific topics within mathematics. They can also identify common strategies used by teachers across multiple sources.

Analyzing Expected Outputs: Researchers can study variations in expected outputs for similar types of problems across different sources. This analysis may help identify differences in approaches across textbooks/resources and enrich our understanding of various teaching methods.

Model Development and Evaluation: Researchers can utilize this dataset to develop machine learning models that automatically assess whether a given math instruction leads to the expected output. By training models on this data, one can create automated systems that provide feedback on math problems or suggest alternative instruction sources.

Scaling the Dataset: Due to its lightweight nature, the MathInstruct dataset is easily accessible and manageable. Researchers can scale up their training data by combining it with other instructional datasets or expand it further by labeling more examples based on similar guidelines.

Conclusion: The MathInstruct dataset serves as a valuable resource for developing and evaluating models related to math instruction tuning. By analyzing the source information and expected outputs, researchers can gain insights into effective teaching methods and build automated assessment

Research Ideas

Model development: This dataset can be used for developing and training models for math instruction...
r
Australian and New Zealand journal of statistics Impact Factor 2024-2025 -...
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Australian and New Zealand journal of statistics Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/211/australian-and-new-zealand-journal-of-statistics
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Australian and New Zealand journal of statistics Impact Factor 2024-2025 - ResearchHelpDesk - The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems. In addition, suitable review papers and articles of historical and general interest will be considered. The journal also publishes book reviews on a regular basis. Abstracting and Indexing Information Academic Search (EBSCO Publishing) Academic Search Alumni Edition (EBSCO Publishing) Academic Search Elite (EBSCO Publishing) Academic Search Premier (EBSCO Publishing) CompuMath Citation Index (Clarivate Analytics) Current Index to Statistics (ASA/IMS) Journal Citation Reports/Science Edition (Clarivate Analytics) Mathematical Reviews/MathSciNet/Current Mathematical Publications (AMS) RePEc: Research Papers in Economics Science Citation Index Expanded (Clarivate Analytics) SCOPUS (Elsevier) Statistical Theory & Method Abstracts (Zentralblatt MATH) ZBMATH (Zentralblatt MATH)
Data from: Data Fission: Splitting a Single Data Point
tandf.figshare.com
txt
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Leiner; Boyan Duan; Larry Wasserman; Aaditya Ramdas (2023). Data Fission: Splitting a Single Data Point [Dataset]. http://doi.org/10.6084/m9.figshare.24328745.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24328745.v2
Dataset updated
Dec 14, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
James Leiner; Boyan Duan; Larry Wasserman; Aaditya Ramdas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Suppose we observe a random vector X from some distribution in a known family with unknown parameters. We ask the following question: when is it possible to split X into two pieces f(X) and g(X) such that neither part is sufficient to reconstruct X by itself, but both together can recover X fully, and their joint distribution is tractable? One common solution to this problem when multiple samples of X are observed is data splitting, but Rasines and Young offers an alternative approach that uses additive Gaussian noise—this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this article, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on several prototypical applications, such as post-selection inference for trend filtering and other regression problems, and effect size estimation after interactive multiple testing. Supplementary materials for this article are available online.
A Review of Published Analyses of Case-Cohort Studies and Recommendations...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen J. Sharp; Manon Poulaliou; Simon G. Thompson; Ian R. White; Angela M. Wood (2023). A Review of Published Analyses of Case-Cohort Studies and Recommendations for Future Reporting [Dataset]. http://doi.org/10.1371/journal.pone.0101176
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0101176
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Stephen J. Sharp; Manon Poulaliou; Simon G. Thompson; Ian R. White; Angela M. Wood
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The case-cohort study design combines the advantages of a cohort study with the efficiency of a nested case-control study. However, unlike more standard observational study designs, there are currently no guidelines for reporting results from case-cohort studies. Our aim was to review recent practice in reporting these studies, and develop recommendations for the future. By searching papers published in 24 major medical and epidemiological journals between January 2010 and March 2013 using PubMed, Scopus and Web of Knowledge, we identified 32 papers reporting case-cohort studies. The median subcohort sampling fraction was 4.1% (interquartile range 3.7% to 9.1%). The papers varied in their approaches to describing the numbers of individuals in the original cohort and the subcohort, presenting descriptive data, and in the level of detail provided about the statistical methods used, so it was not always possible to be sure that appropriate analyses had been conducted. Based on the findings of our review, we make recommendations about reporting of the study design, subcohort definition, numbers of participants, descriptive information and statistical methods, which could be used alongside existing STROBE guidelines for reporting observational studies.
m
Calculus Video Worked Example Data
data.mendeley.com
Updated Apr 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamison Judd (2019). Calculus Video Worked Example Data [Dataset]. http://doi.org/10.17632/t3xr5j67fd.1
Explore at:
Unique identifier
https://doi.org/10.17632/t3xr5j67fd.1
Dataset updated
Apr 12, 2019
Authors
Jamison Judd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data from a Calculus II class where students were required to watch an instructional video before or after lecture. Dataset includes gender (1=female; 2=male), vgroup (-1=before lecture; 1=after lecture), binary flag for 26 individual videos (1=watched 80% or more of length of video; 0=not watched), videosum (sum of number of videos watched), final_raw (raw grade student received on cumulative final course exam), sat_math (scaled SAT-Math score out of 800), math_place (institutional calculus readiness score out of 100), watched20 (grouping flag for students who watched 20 or more videos).
gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
Mathematics Dataset
github.com
opendatalab.com
+1more
Updated Apr 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMindhttp://deepmind.com/
Description
This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r. Answer: 4 Question: Calculate -841880142.544 + 411127. Answer: -841469015.544 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)). Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)

arithmetic (pairwise operations and mixed expressions, surds)

calculus (differentiation)

comparison (closest numbers, pairwise comparisons, sorting)

measurement (conversion, working with time)

numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)

polynomials (addition, simplification, composition, evaluating, expansion)

probability (sampling without replacement)
Math problems IMO
kaggle.com
zip
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artem Goncharov (2025). Math problems IMO [Dataset]. https://www.kaggle.com/datasets/artemgoncarov/math-problems-imo
Explore at:
zip(66054740 bytes)Available download formats
Dataset updated
Jan 15, 2025
Authors
Artem Goncharov
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data with 100.000 diverse problems from International Math Olympiads (AIME, IMO etc).

You can use it for example for RAG systems or just to fine-tune model. If you like it, please upvote. Have a good work with this data!
r
Australian and New Zealand journal of statistics Acceptance Rate -...
researchhelpdesk.org
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Australian and New Zealand journal of statistics Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/211/australian-and-new-zealand-journal-of-statistics
Explore at:
Dataset updated
Mar 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Australian and New Zealand journal of statistics Acceptance Rate - ResearchHelpDesk - The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems. In addition, suitable review papers and articles of historical and general interest will be considered. The journal also publishes book reviews on a regular basis. Abstracting and Indexing Information Academic Search (EBSCO Publishing) Academic Search Alumni Edition (EBSCO Publishing) Academic Search Elite (EBSCO Publishing) Academic Search Premier (EBSCO Publishing) CompuMath Citation Index (Clarivate Analytics) Current Index to Statistics (ASA/IMS) Journal Citation Reports/Science Edition (Clarivate Analytics) Mathematical Reviews/MathSciNet/Current Mathematical Publications (AMS) RePEc: Research Papers in Economics Science Citation Index Expanded (Clarivate Analytics) SCOPUS (Elsevier) Statistical Theory & Method Abstracts (Zentralblatt MATH) ZBMATH (Zentralblatt MATH)
q
Instructor Guide: Integrating Leadership Roles, Artificial Intelligence,...
qubeshub.org
Updated Jan 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Pankaj Mehrotra (2025). Instructor Guide: Integrating Leadership Roles, Artificial Intelligence, PhET Simulation, HHMI-Biointeractive Data Explorer and Google Tools to understand Mathematics and Statistics. [Dataset]. http://doi.org/10.25334/KMDZ-N209
Explore at:
Unique identifier
https://doi.org/10.25334/KMDZ-N209
Dataset updated
Jan 4, 2025
Dataset provided by
QUBES
Authors
Dr Pankaj Mehrotra
Description
Mathematical and Statistical analysis skills are important skills to be included in the course curriculum. Together or individually, these skills can advance knowledge, critical thinking, and creativity. In this guide, I provide an overview of how leadership roles, AI skills, simulation based learning and google tools can be integrated into class activities to help students understand examples of application of mathematical and statistical concepts such as sum, mean, data and data analysis. Through these activities, students develop an understanding that mathematics and statistics are interdependent and cross disciplines. Using simulations, students use the simulation tools to learn about application of mathematics and statistics in real-life and research practices as they learn the concepts of mathematics through PhET Simulation and collect data to apply data organization, analysis and statistics through HHMI-Biointeractive Data Explorer thus introducing key concepts in mathematics and statistics.
n
Data from: Exploring Human-Like Mathematical Reasoning: Perspectives on...
curate.nd.edu
pdf
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenwen Liang (2024). Exploring Human-Like Mathematical Reasoning: Perspectives on Generalizability and Efficiency [Dataset]. http://doi.org/10.7274/27895872.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/27895872.v1
Dataset updated
Dec 3, 2024
Dataset provided by
University of Notre Dame
Authors
Zhenwen Liang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Mathematical reasoning, a fundamental aspect of human cognition, poses significant challenges for artificial intelligence (AI) systems. Despite recent advancements in natural language processing (NLP) and large language models (LLMs), AI's ability to replicate human-like reasoning, generalization, and efficiency remains an ongoing research challenge. In this dissertation, we address key limitations in MWP solving, focusing on the accuracy, generalization ability and efficiency of AI-based mathematical reasoners by applying human-like reasoning methods and principles.

This dissertation introduces several innovative approaches in mathematical reasoning. First, a numeracy-driven framework is proposed to enhance math word problem (MWP) solvers by integrating numerical reasoning into model training, surpassing human-level performance on benchmark datasets. Second, a novel multi-solution framework captures the diversity of valid solutions to math problems, improving the generalization capabilities of AI models. Third, a customized knowledge distillation technique, termed Customized Exercise for Math Learning (CEMAL), is developed to create tailored exercises for smaller models, significantly improving their efficiency and accuracy in solving MWPs. Additionally, a multi-view fine-tuning paradigm (MinT) is introduced to enable smaller models to handle diverse annotation styles from different datasets, improving their adaptability and generalization. To further advance mathematical reasoning, a benchmark, MathChat, is introduced to evaluate large language models (LLMs) in multi-turn reasoning and instruction-following tasks, demonstrating significant performance improvements. Finally, new inference-time verifiers, Math-Rev and Code-Rev, are developed to enhance reasoning verification, combining language-based and code-based solutions for improved accuracy in both math and code reasoning tasks.

In summary, this dissertation provides a comprehensive exploration of these challenges and contributes novel solutions that push the boundaries of AI-driven mathematical reasoning. Potential future research directions are also discussed to further extend the impact of this dissertation.
U
Data from: Dataset of the study: "Chatbots put to the test in math and logic...
researchdata.bath.ac.uk
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios (2023). Dataset of the study: "Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard" [Dataset]. http://doi.org/10.5281/zenodo.7940781
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7940781
Dataset updated
May 20, 2023
Dataset provided by
Zenodo
Authors
Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios
Dataset funded by
Oslo Metropolitan University
Description
This dataset contains the 30 questions that were posed to the chatbots (i) ChatGPT-3.5; (ii) ChatGPT-4; and (iii) Google Bard, in May 2023 for the study “Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard”. These 30 questions describe mathematics and logic problems that have a unique correct answer. The questions are fully described with plain text only, without the need for any images or special formatting. The questions are divided into two sets of 15 questions each (Set A and Set B). The questions of Set A are 15 “Original” problems that cannot be found online, at least in their exact wording, while Set B contains 15 “Published” problems that one can find online by searching on the internet, usually with their solution. Each question is posed three times to each chatbot.

This dataset contains the following: (i) The full set of the 30 questions, A01-A15 and B01-B15; (ii) the correct answer for each one of them; (iii) an explanation of the solution, for the problems where such an explanation is needed, (iv) the 30 (questions) × 3 (chatbots) × 3 (answers) = 270 detailed answers of the chatbots. For the published problems of Set B, we also provide a reference to the source where each problem was taken from.
Math CoT Arabic English Reasoning
kaggle.com
zip
Updated May 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miscovery (2025). Math CoT Arabic English Reasoning [Dataset]. https://www.kaggle.com/datasets/miscovery/math-cot-arabic-english-reasoning
Explore at:
zip(920398 bytes)Available download formats
Dataset updated
May 16, 2025
Authors
Miscovery
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Math CoT Arabic English Dataset

A high-quality, bilingual (English & Arabic) dataset for Chain-of-Thought (COT) reasoning in mathematics and related disciplines, developed by Miscovery AI.

Overview

Math-COT is a unique dataset designed to facilitate and benchmark the development of chain-of-thought reasoning capabilities in language models across mathematical domains. With meticulously crafted examples, explicit reasoning steps, and bilingual support, this dataset offers a robust foundation for training and evaluating mathematical reasoning abilities.

Key Features

99% Clean & High-Quality Data: Human-reviewed, accurately annotated examples with verified solutions

Bilingual Support: Complete English and Arabic parallel content for cross-lingual research and applications

Structured Reasoning Steps: Each problem solution is broken down into explicit step-by-step reasoning

Diverse Subject Coverage: Spans 21 different categories within mathematics and adjacent fields

Comprehensive Format: Includes questions, answers, reasoning chains, and relevant metadata

Dataset Structure

Each entry in the dataset contains the following fields:

{ "en_question": "Question text in English", "ar_question": "Question text in Arabic", "en_answer": "Detailed step-by-step solution in English", "ar_answer": "Detailed step-by-step solution in Arabic", "category": "Mathematical category", "en_q_word": "Word count of English question", "ar_q_word": "Word count of Arabic question", "en_a_word": "Word count of English answer", "ar_a_word": "Word count of Arabic answer" }

Categories

The dataset covers 21 distinct categories:

Mathematics - Arithmetic

Mathematics - Algebra

Mathematics - Geometry

Mathematics - Trigonometry

Mathematics - Calculus

Mathematics - Linear Algebra

Mathematics - Probability

Mathematics - Statistics

Mathematics - Set Theory

Mathematics - Number Theory

Mathematics - Discrete Math

Mathematics - Topology

Mathematics - Differential Equations

Mathematics - Real Analysis

Math Puzzles

Linguistics

Logic and Reasoning

Philosophy

Sports and Games

Psychology

Cultural Traditions

Example

Here's a sample entry from the dataset:

{ "en_question": "A bag contains only red and blue balls. If one ball is drawn at random, the probability that it is red is 2/5. If 8 more red balls are added, the probability of drawing a red ball becomes 4/5. How many blue balls are there in the bag?", "ar_question": "تحتوي الحقيبة على كرات حمراء وزرقاء فقط. إذا تم سحب كرة واحدة عشوائيًا ، فإن احتمال أن تكون حمراء هو 2/5. إذا تمت إضافة 8 كرات حمراء أخرى ، يصبح احتمال سحب كرة حمراء 4/5. كم عدد الكرات الزرقاء الموجودة في الحقيبة؟",

Usage

This dataset is especially valuable for:

Training and evaluating mathematical reasoning in language models

Research on step-by-step problem solving approaches

Developing educational AI assistants for mathematics

Cross-lingual research on mathematical reasoning

Benchmarking Chain-of-Thought (COT) capabilities

Citation

If you use this dataset in your research, please cite:

@dataset{miscoveryai2025mathcot, title={Math CoT Arabic English Reasoning: A Bilingual Dataset for Chain-of-Thought Mathematical Reasoning}, author={Miscovery AI}, year={2025}, publisher={Kaggle}, url={https://www.kaggle.com/datasets/miscovery/math-cot-arabic-english-reasoning} }

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions, feedback, or issues related to this dataset, please contact Miscovery AI at info@miscovery.com.
Math Dataset
kaggle.com
opendatalab.com
zip
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2024). Math Dataset [Dataset]. https://www.kaggle.com/datasets/awsaf49/math-dataset
Explore at:
zip(7412179 bytes)Available download formats
Dataset updated
Mar 12, 2024
Authors
Awsaf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Awsaf

Released under MIT

Contents

Reference: https://github.com/hendrycks/math/
ASSISTments Replication Study - 2019-2020 cohort
openicpsr.org
delimited
Updated Dec 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingyu Feng; Neil Heffernan; Robert Murphy; Jeremy Roschelle (2022). ASSISTments Replication Study - 2019-2020 cohort [Dataset]. http://doi.org/10.3886/E183645V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E183645V1
Dataset updated
Dec 22, 2022
Dataset provided by
Digital Promise
SRI
WestEd
Worcester Polytechnic Institute
Authors
Mingyu Feng; Neil Heffernan; Robert Murphy; Jeremy Roschelle
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States, North Carolina
Description
The purpose of the ASSISTments Replication Study is to conduct a replication study of the impact of a fully developed, widely adopted intervention called ASSISTments on middle school student mathematics outcomes. ASSISTments is an online formative assessment platform that provides immediate feedback to students and supports teachers in their use of homework to improve math instruction and learning. Findings from a previous IES-funded efficacy study, conducted in Maine, indicated this intervention led to beneficial impacts on student learning outcomes in 7th grade. The current study examined the impacts of this intervention with a more diverse sample and relied on trained local math coaches (instead of the intervention developers) to provide professional development and support to teachers. Participating schools (and all 7th grade math teachers in the school) in this study were randomly assigned to either a treatment or control group. Teachers participated in the project over a two year period, the 2018-19 school year and the 2019-20 school year. The 2018-19 school year was to serve as a ramp-up year. Data used in the final analysis was collected during the second year of the study, the 2019-20 school year. The data contained in this project is primarily from the 2019-20 school year and includes student ASSISTments usage data, teacher ASSISTments usage data, student outcome data, and teacher instructional log data. Student outcome data is from the online Mathematics Readiness Test for Grade 8 developed by Math Diagnostic Test Project (MDTP). The teacher instructional log had teachers to answer questions about their daily instructional practices over the span of 5 consecutive days of instruction. They were asked to participate in 3 rounds of logs over the course of the 2019-2020 school year. Student and teacher usage data of ASSISTments were collected automatically as they used the system. The usage data was limited to treatment group only. Other data (outcome data, teacher instructional log data) were collected from both treatment and control groups.
q
Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...
qubeshub.org
Updated Jan 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
Explore at:
Unique identifier
https://doi.org/10.25334/5PX5-H796
Dataset updated
Jan 11, 2022
Dataset provided by
QUBES
Authors
Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
Description
Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.
Data from: Learning Mathematics for Life A Perspective from PISA
catalog.data.gov
datasets.ai
+1more
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). Learning Mathematics for Life A Perspective from PISA [Dataset]. https://catalog.data.gov/dataset/learning-mathematics-for-life-a-perspective-from-pisa
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
United States Department of Statehttp://state.gov/
Area covered
Pisa
Description
People from many countries have expressed interest in the tests students take for the Programme for International Student Assessment (PISA). Learning Mathematics for Life examines the link between the PISA test requirements and student performance. It focuses specifically on the proportions of students who answer questions correctly across a range of difficulty. The questions are classified by content, competencies, context and format, and the connections between these and student performance are then analysed. This analysis has been carried out in an effort to link PISA results to curricular programmes and structures in participating countries and economies. Results from the student assessment reflect differences in country performance in terms of the test questions. These findings are important for curriculum planners, policy makers and in particular teachers – especially mathematics teachers of intermediate and lower secondary school classes.
Z
Data from: MLFMF: Data Sets for Machine Learning for Mathematical...
data.niaid.nih.gov
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo (2023). MLFMF: Data Sets for Machine Learning for Mathematical Formalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10041074
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
University of Ljubljana
Institute of Mathematics, Physics, and Mechanics
Authors
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLFMF MLFMF (Machine Learning for Mathematical Formalization) is a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. With more than 250,000 entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format. In addition to benchmarking the recommendation systems, the data sets can also be used for benchmarking node classification and link prediction algorithms. The four data sets Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes

the largest Lean 4 library Mathlib, the three largest Agda libraries:

the standard library the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the syntax trees give complete and easily parsed information about each entry. The Lean library data set was obtained by converting .olean files into s-expressions (see the lean2sexp tool). The Agda data sets were obtained with an s-expression extension of the official Agda repository (use either master-sexp or release-2.6.3-sexp branch). For more details, see our arXiv copy of the paper. Directory structure First, the mlfmf.zip archive needs to be unzipped. It contains a separate directory for every library (for example, the standard library of Agda can be found in the stdlib directory) and some auxiliary files. Every library directory contains

the network file from which the heterogeneous network can be loaded, a zip of the entries directory that contains (many) files with abstract syntax trees. Each of those files describes a single entry of the library. In addition to the auxiliary files which are used for loading the data (and described below), the zipped sources of lean2sexp and Agda s-expression extension are present. Loading the data In addition to the data files, there is also a simple python script main.py for loading the data. To run it, you will have to install the packages listed in the file requirements.txt: tqdm and networkx. The easiest way to do so is calling pip install -r requirements.txt. When running main.py for the first time, the script will unzip the entry files into the directory named entries. After that, the script loads the syntax trees of the entries (see the Entry class) and the network (as networkx.MultiDiGraph object). Note. The entry files have extension .dag (directed acyclic graph), since Lean uses node sharing, which breaks the tree structure (a shared node has more than one parent node). More information For more information about the data collection process, detailed data (and data format) description, and baseline experiments that were already performed with these data, see our arXiv copy of the paper. For the code that was used to perform the experiments and data format description, visit our github repository https://github.com/ul-fmf/mlfmf-data. Funding Since not all the funders are available in the Zenodo's database, we list them here:

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-21-1-0024. The authors also acknowledge the financial support of the Slovenian Research Agency via the research core funding No. P2-0103 and No. P1-0294.
Data from: Impacts of lessons management based on Mathematics words problems...
scielo.figshare.com
jpeg
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Alice Veiga Ferreira de Souza (2023). Impacts of lessons management based on Mathematics words problems on learning [Dataset]. http://doi.org/10.6084/m9.figshare.5720452.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5720452.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Maria Alice Veiga Ferreira de Souza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT This article presents potential successes and constraints presented in lessons based on written words problems of mathematics impacting on the learning process of students in the eighth year of Portuguese classes in an elementary school. Those problems have been proposed by future teachers during a supervised internship at the University of Lisbon. The data emerged from strata of interaction/intervention of a teacher-coach with three interns regarding the actions of their lessons based on written words problems of Mathematics. Successes have been identified such as the association of geometric figures to their algebraic expressions and the conduction of explanations by direct questions on the subject, as well as constraints as confusing mathematical concepts, written commands with no meaning for students, terms without proper contextualization to the mathematical context. The research has been supported by authors and researchers in the field of problem solving, the understanding of statements of math problems and the training in/of teaching practice.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Ranking of LLM tools in solving math problems 2024 [Dataset]. https://www.statista.com/statistics/1458141/leading-math-llm-tools/

Ranking of LLM tools in solving math problems 2024

Explore at:

Dataset updated

Jun 25, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Mar 2024

Area covered

Worldwide

Description

As of March 2024, OpenAI o1 was the large language model (LLM) tool that had the best benchmark score in solving math problems, with a score of **** percent. Close behind, in second place, was OpenAI o1-mini, followed by GPT-4o.

Clear search

Close search

Google apps

Main menu

Ranking of LLM tools in solving math problems 2024

MathInstruct Dataset: Hybrid Math Instruction

MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

About this dataset

How to use the dataset

Research Ideas

Australian and New Zealand journal of statistics Impact Factor 2024-2025 -...

Data from: Data Fission: Splitting a Single Data Point

A Review of Published Analyses of Case-Cohort Studies and Recommendations...

Calculus Video Worked Example Data

gsm8k

Mathematics Dataset

Math problems IMO

Australian and New Zealand journal of statistics Acceptance Rate -...

Instructor Guide: Integrating Leadership Roles, Artificial Intelligence,...

Data from: Exploring Human-Like Mathematical Reasoning: Perspectives on...

Data from: Dataset of the study: "Chatbots put to the test in math and logic...

Math CoT Arabic English Reasoning

Math CoT Arabic English Dataset

Overview

Key Features

Dataset Structure

Categories

Example

Usage

Citation

License

Contact

Math Dataset

Dataset

Contents

ASSISTments Replication Study - 2019-2020 cohort

Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

Data from: Learning Mathematics for Life A Perspective from PISA

Data from: MLFMF: Data Sets for Machine Learning for Mathematical...

Data from: Impacts of lessons management based on Mathematics words problems...

Ranking of LLM tools in solving math problems 2024