100+ datasets found

Math Dataset
kaggle.com
opendatalab.com
zip
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2024). Math Dataset [Dataset]. https://www.kaggle.com/datasets/awsaf49/math-dataset
Explore at:
zip(7412179 bytes)Available download formats
Dataset updated
Mar 12, 2024
Authors
Awsaf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Awsaf

Released under MIT

Contents

Reference: https://github.com/hendrycks/math/
Airoboros LLMs Math Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Airoboros LLMs Math Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/airoboros-llms-math-dataset
Explore at:
zip(36964941 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Airoboros LLMs Math Dataset

Mastering Complex Mathematical Operations in Machine Learning

By Huggingface Hub [source]

About this dataset

The Airoboros-3.1 dataset is the perfect tool to help machine learning models excel in the difficult realm of complicated mathematical operations. This data collection features thousands of conversations between machines and humans, formatted in ShareGPT to maximize optimization in an OS ecosystem. The dataset’s focus on advanced subjects like factorials, trigonometry, and larger numerical values will help drive machine learning models to the next level - facilitating critical acquisition of sophisticated mathematical skills that are essential for ML success. As AI technology advances at such a rapid pace, training neural networks to correspondingly move forward can be a daunting and complicated challenge - but with Airoboros-3.1’s powerful datasets designed around difficult mathematical operations it just became one step closer to achievable!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

To get started, download the dataset from Kaggle and use the train.csv file. This file contains over two thousand examples of conversations between ML models and humans which have been formatted using ShareGPT - fast and efficient OS ecosystem fine-tuning tools designed to help with understanding mathematical operations more easily. The file includes two columns: category and conversations, both of which are marked as strings in the data itself.

Once you have downloaded the train file you can begin setting up your own ML training environment by using any of your preferred frameworks or methods. Your model should focus on predicting what kind of mathematical operations will likely be involved in future conversations by referring back to previous dialogues within this dataset for reference (category column). You can also create your own test sets from this data, adding new conversation topics either by modifying existing rows or creating new ones entirely with conversation topics related to mathematics. Finally, compare your model’s results against other established models or algorithms that are already published online!

Happy training!

Research Ideas

It can be used to build custom neural networks or machine learning algorithms that are specifically designed for complex mathematical operations.

This data set can be used to teach and debug more general-purpose machine learning models to recognize large numbers, and intricate calculations within natural language processing (NLP).

The Airoboros-3.1 dataset can also be utilized as a supervised learning task: models could learn from the conversations provided in the dataset how to respond correctly when presented with complex mathematical operations

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:------------------|:-----------------------------------------------------------------------------| | category | The type of mathematical operation being discussed. (String) | | conversations | The conversations between the machine learning model and the human. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
h
math-preference-dataset
huggingface.co
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Han Díaz (2024). math-preference-dataset [Dataset]. https://huggingface.co/datasets/sdiazlor/math-preference-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Authors
Sara Han Díaz
Description
Dataset Card for math-preference-dataset

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/sdiazlor/math-preference-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/sdiazlor/math-preference-dataset.
MathInstruct Dataset: Hybrid Math Instruction
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MathInstruct Dataset: Hybrid Math Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathinstruct-dataset-hybrid-math-instruction-tun
Explore at:
zip(60239940 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

By TIGER-Lab (From Huggingface) [source]

About this dataset

MathInstruct is a comprehensive and meticulously curated dataset specifically designed to facilitate the development and evaluation of models for math instruction tuning. This dataset consists of a total of 13 different math rationale datasets, out of which six have been exclusively curated for this project, ensuring a diverse range of instructional materials. The main objective behind creating this dataset is to provide researchers with an easily accessible and manageable resource that aids in enhancing the effectiveness and precision of math instruction.

One noteworthy feature of MathInstruct is its lightweight nature, making it highly convenient for researchers to utilize without any hassle. With carefully selected columns such as source, source, output, output, users can readily identify the origin or reference material from where the math instruction was obtained. Additionally, they can also refer to the expected output or solution corresponding to each specific math problem or exercise.

Overall, MathInstruct offers immense potential in refining hybrid math instruction by facilitating meticulous model development and rigorous evaluation processes. Researchers can leverage this diverse dataset to gain deeper insights into effective teaching methodologies while exploring innovative approaches towards enhancing mathematical learning experiences

How to use the dataset

Title: How to Use the MathInstruct Dataset for Hybrid Math Instruction Tuning

Introduction: The MathInstruct dataset is a comprehensive collection of math instruction examples, designed to assist in developing and evaluating models for math instruction tuning. This guide will provide an overview of the dataset and explain how to make effective use of it.

Understanding the Dataset Structure: The dataset consists of a file named train.csv. This CSV file contains the training data, which includes various columns such as source and output. The source column represents the source of math instruction (textbook, online resource, or teacher), while the output column represents expected output or solution to a particular math problem or exercise.

Accessing the Dataset: To access the MathInstruct dataset, you can download it from Kaggle's website. Once downloaded, you can read and manipulate the data using programming languages like Python with libraries such as pandas.

Exploring the Columns: a) Source Column: The source column provides information about where each math instruction comes from. It may include references to specific textbooks, online resources, or even teachers who provided instructional material. b) Output Column: The output column specifies what students are expected to achieve as a result of each math instruction. It contains solutions or expected outputs for different math problems or exercises.

Utilizing Source Information: By analyzing the different sources mentioned in this dataset, researchers can understand which instructional materials are more effective in teaching specific topics within mathematics. They can also identify common strategies used by teachers across multiple sources.

Analyzing Expected Outputs: Researchers can study variations in expected outputs for similar types of problems across different sources. This analysis may help identify differences in approaches across textbooks/resources and enrich our understanding of various teaching methods.

Model Development and Evaluation: Researchers can utilize this dataset to develop machine learning models that automatically assess whether a given math instruction leads to the expected output. By training models on this data, one can create automated systems that provide feedback on math problems or suggest alternative instruction sources.

Scaling the Dataset: Due to its lightweight nature, the MathInstruct dataset is easily accessible and manageable. Researchers can scale up their training data by combining it with other instructional datasets or expand it further by labeling more examples based on similar guidelines.

Conclusion: The MathInstruct dataset serves as a valuable resource for developing and evaluating models related to math instruction tuning. By analyzing the source information and expected outputs, researchers can gain insights into effective teaching methods and build automated assessment

Research Ideas

Model development: This dataset can be used for developing and training models for math instruction...
Z
Data from: MLFMF: Data Sets for Machine Learning for Mathematical...
data.niaid.nih.gov
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo (2023). MLFMF: Data Sets for Machine Learning for Mathematical Formalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10041074
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
University of Ljubljana
Institute of Mathematics, Physics, and Mechanics
Authors
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLFMF MLFMF (Machine Learning for Mathematical Formalization) is a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. With more than 250,000 entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format. In addition to benchmarking the recommendation systems, the data sets can also be used for benchmarking node classification and link prediction algorithms. The four data sets Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes

the largest Lean 4 library Mathlib, the three largest Agda libraries:

the standard library the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the syntax trees give complete and easily parsed information about each entry. The Lean library data set was obtained by converting .olean files into s-expressions (see the lean2sexp tool). The Agda data sets were obtained with an s-expression extension of the official Agda repository (use either master-sexp or release-2.6.3-sexp branch). For more details, see our arXiv copy of the paper. Directory structure First, the mlfmf.zip archive needs to be unzipped. It contains a separate directory for every library (for example, the standard library of Agda can be found in the stdlib directory) and some auxiliary files. Every library directory contains

the network file from which the heterogeneous network can be loaded, a zip of the entries directory that contains (many) files with abstract syntax trees. Each of those files describes a single entry of the library. In addition to the auxiliary files which are used for loading the data (and described below), the zipped sources of lean2sexp and Agda s-expression extension are present. Loading the data In addition to the data files, there is also a simple python script main.py for loading the data. To run it, you will have to install the packages listed in the file requirements.txt: tqdm and networkx. The easiest way to do so is calling pip install -r requirements.txt. When running main.py for the first time, the script will unzip the entry files into the directory named entries. After that, the script loads the syntax trees of the entries (see the Entry class) and the network (as networkx.MultiDiGraph object). Note. The entry files have extension .dag (directed acyclic graph), since Lean uses node sharing, which breaks the tree structure (a shared node has more than one parent node). More information For more information about the data collection process, detailed data (and data format) description, and baseline experiments that were already performed with these data, see our arXiv copy of the paper. For the code that was used to perform the experiments and data format description, visit our github repository https://github.com/ul-fmf/mlfmf-data. Funding Since not all the funders are available in the Zenodo's database, we list them here:

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-21-1-0024. The authors also acknowledge the financial support of the Slovenian Research Agency via the research core funding No. P2-0103 and No. P1-0294.
d
Math Test Results 2013-2023
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Math Test Results 2013-2023 [Dataset]. https://catalog.data.gov/dataset/math-test-results-2013-2023
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
data.cityofnewyork.us
Description
This report includes results for the New York State Math exams for the years 2013-2023. For the results for the New York State Math exams for the years 2006-2012, please follow this link.
h
small-open-web-math-dataset
huggingface.co
Updated Nov 6, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brando Miranda (2011). small-open-web-math-dataset [Dataset]. https://huggingface.co/datasets/brando/small-open-web-math-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2011
Authors
Brando Miranda
Description
Small Open Web Math Dataset

A 10k-sample subset of OpenWebMath, focused on high-quality mathematical text.
w
Dataset of books called Math for all. Participant book grades K-2
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Math for all. Participant book grades K-2 [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Math+for+all.+Participant+book+grades+K-2
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book is Math for all. Participant book grades K-2. It features 7 columns including author, publication date, language, and book publisher.
d
ThirdGrade ELA Math Scores byTract 08032017
catalog.data.gov
detroitdata.org
+5more
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Driven Detroit (2024). ThirdGrade ELA Math Scores byTract 08032017 [Dataset]. https://catalog.data.gov/dataset/thirdgrade-ela-math-scores-bytract-08032017-eca07
Explore at:
Dataset updated
Sep 21, 2024
Dataset provided by
Data Driven Detroit
Description
Third grade English Language Arts (ELA) and Math test results for the 2016-2017 school year by census tract for the state of Michigan. Data Driven Detroit obtained these datasets from MI School Data, for the State of the Detroit Child tool in July 2017. Test results were originally obtained on a school level and aggregated to census tract by Data Driven Detroit. Student data was suppressed when less than five students were tested per school.Click here for metadata (descriptions of the fields).
Mathematical Problems Dataset: Various
kaggle.com
zip
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mathematical Problems Dataset: Various [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathematical-problems-dataset-various-mathematic/code
Explore at:
zip(2498203187 bytes)Available download formats
Dataset updated
Dec 2, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

By math_dataset (From Huggingface) [source]

About this dataset

This dataset comprises a collection of mathematical problems and their solutions designed for training and testing purposes. Each problem is presented in the form of a question, followed by its corresponding answer. The dataset covers various mathematical topics such as arithmetic, polynomials, and prime numbers. For instance, the arithmetic_nearest_integer_root_test.csv file focuses on problems involving finding the nearest integer root of a given number. Similarly, the polynomials_simplify_power_test.csv file deals with problems related to simplifying polynomials with powers. Additionally, the dataset includes the numbers_is_prime_train.csv file containing math problems that require determining whether a specific number is prime or not. The questions and answers are provided in text format to facilitate analysis and experimentation with mathematical problem-solving algorithms or models

How to use the dataset

Introduction: The Mathematical Problems Dataset contains a collection of various mathematical problems and their corresponding solutions or answers. This guide will provide you with all the necessary information on how to utilize this dataset effectively.

Understanding the columns: The dataset consists of several columns, each representing a different aspect of the mathematical problem and its solution. The key columns are:

question: This column contains the text representation of the mathematical problem or equation.

answer: This column contains the text representation of the solution or answer to the corresponding problem.

Exploring specific problem categories: To focus on specific types of mathematical problems, you can filter or search within the dataset using relevant keywords or terms related to your area of interest. For example, if you are interested in prime numbers, you can search for prime in the question column.

Applying machine learning techniques: This dataset can be used for training machine learning models related to natural language understanding and mathematics. You can explore various techniques such as text classification, sentiment analysis, or even sequence-to-sequence models for solving mathematical problems based on their textual representations.

Generating new questions and solutions: By analyzing patterns in this dataset, you can generate new questions and solutions programmatically using techniques like data augmentation or rule-based methods.

Validation and evaluation: As with any other machine learning task, it is essential to validate your models on separate validation sets not included in this dataset properly. You can also evaluate model performance by comparing predictions against known answers provided in this dataset's answer column.

Sharing insights and findings: After working with this datasets, it would be beneficial for researchers or educators to share their insights, approaches taken during analysis/modelling as Kaggle notebooks/ discussions/ blogs/ tutorials etc., so that others could get benefited from such shared resources too.

Note: Please note that the dataset does not include dates.

By following these guidelines, you can effectively explore and utilize the Mathematical Problems Dataset for various mathematical problem-solving tasks. Happy exploring!

Research Ideas

Developing machine learning algorithms for solving mathematical problems: This dataset can be used to train and test models that can accurately predict the solution or answer to different mathematical problems.

Creating educational resources: The dataset can be used to create a wide variety of educational materials such as problem sets, worksheets, and quizzes for students studying mathematics.

Research in mathematical problem-solving strategies: Researchers and educators can analyze the dataset to identify common patterns or strategies employed in solving different types of mathematical problems. This analysis can help improve teaching methodologies and develop effective problem-solving techniques

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purpos...
h
SAND-MATH
huggingface.co
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMD (2025). SAND-MATH [Dataset]. https://huggingface.co/datasets/amd/SAND-MATH
Explore at:
Dataset updated
Jul 29, 2025
Dataset authored and provided by
AMD
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
SAND-Math: A Synthetic Dataset of Difficult Problems to Elevate LLM Math Performance

📃 Paper | 🤗 Dataset SAND-Math (Synthetic Augmented Novel and Difficult Mathematics) is a high-quality, high-difficulty dataset of mathematics problems and solutions. It is generated using a novel pipeline that addresses the critical bottleneck of scarce, high-difficulty training data for mathematical Large Language Models (LLMs).

Key Features

Novel Problem Generation: Problems are… See the full description on the dataset page: https://huggingface.co/datasets/amd/SAND-MATH.
Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...
scielo.figshare.com
datasetcatalog.nlm.nih.gov
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6857033.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.
Z
Dataset for Does High Mathematical Flexibility Correlate with Enhanced...
data-staging.niaid.nih.gov
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
No description provided (2025). Dataset for Does High Mathematical Flexibility Correlate with Enhanced Self-Regulated Learning (SRL) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14858594
Explore at:
Dataset updated
Feb 19, 2025
Authors
No description provided
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains student flexibility scores across four dimensions: Using multiple strategies, Identifying appropriate strategies from among self-generated strategies, Identifying appropriate strategies from among provided strategies, and Using appropriate strategies, as well as SRL scores across five dimensions: Value, Expectancy, Affect, Cognitive and Metacognitive Strategies, and Resource Management. The data were collected from a sample of 272 secondary students.
h
math-writing-dataset-google
huggingface.co
Updated Jan 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andres Marafioti (2025). math-writing-dataset-google [Dataset]. https://huggingface.co/datasets/andito/math-writing-dataset-google
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 3, 2025
Authors
Andres Marafioti
Description
andito/math-writing-dataset-google dataset hosted on Hugging Face and contributed by the HF Datasets community
U
Data from: Dataset of the study: "Chatbots put to the test in math and logic...
researchdata.bath.ac.uk
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios (2023). Dataset of the study: "Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard" [Dataset]. http://doi.org/10.5281/zenodo.7940781
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7940781
Dataset updated
May 20, 2023
Dataset provided by
Zenodo
Authors
Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios
Dataset funded by
Oslo Metropolitan University
Description
This dataset contains the 30 questions that were posed to the chatbots (i) ChatGPT-3.5; (ii) ChatGPT-4; and (iii) Google Bard, in May 2023 for the study “Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard”. These 30 questions describe mathematics and logic problems that have a unique correct answer. The questions are fully described with plain text only, without the need for any images or special formatting. The questions are divided into two sets of 15 questions each (Set A and Set B). The questions of Set A are 15 “Original” problems that cannot be found online, at least in their exact wording, while Set B contains 15 “Published” problems that one can find online by searching on the internet, usually with their solution. Each question is posed three times to each chatbot.

This dataset contains the following: (i) The full set of the 30 questions, A01-A15 and B01-B15; (ii) the correct answer for each one of them; (iii) an explanation of the solution, for the problems where such an explanation is needed, (iv) the 30 (questions) × 3 (chatbots) × 3 (answers) = 270 detailed answers of the chatbots. For the published problems of Set B, we also provide a reference to the source where each problem was taken from.
m
Dataset of Mathematical Terminology and words with POS tags
data.mendeley.com
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davlatyor Mengliev (2025). Dataset of Mathematical Terminology and words with POS tags [Dataset]. http://doi.org/10.17632/5s5b9mjwbh.2
Explore at:
Unique identifier
https://doi.org/10.17632/5s5b9mjwbh.2
Dataset updated
Jun 5, 2025
Authors
Davlatyor Mengliev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As part of the study, a dataset of mathematical words, phrases and terminology in the Uzbek language was formed. 1) This dataset contains 858 unique words and terms in mathematics. 2) A distinctive feature of the dataset is that the words and terms in it have a weighting coefficient for each of the five mathematical areas (Discrete Mathematics, Geometry, Probability Theory, Differential Equations, Higher Mathematics). 3) The penultimate column of the dataset contains the English translation of this word. 4) The last column of the dataset contains information on the part of speech to which this word belongs.
g
Mathematics Dataset
giters.com
opendatalab.com
+1more
Updated Apr 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://giters.com/edzai/mathematics_dataset
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMind
Description
This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r. Answer: 4 Question: Calculate -841880142.544 + 411127. Answer: -841469015.544 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)). Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)

arithmetic (pairwise operations and mixed expressions, surds)

calculus (differentiation)

comparison (closest numbers, pairwise comparisons, sorting)

measurement (conversion, working with time)

numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)

polynomials (addition, simplification, composition, evaluating, expansion)

probability (sampling without replacement)
d
Algebra equation solving performance by LD and non-LD students using...
datadryad.org
zip
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henry Borenson (2025). Algebra equation solving performance by LD and non-LD students using hands-on equations (grades 6-8) [Dataset]. http://doi.org/10.5061/dryad.sn02v6xh8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.sn02v6xh8
Dataset updated
Jun 17, 2025
Dataset provided by
Dryad
Authors
Henry Borenson
Time period covered
Jun 4, 2025
Description
Algebra equation solving performance using hands-on equations for students with learning disabilities and their peers (grades 6-8)

Dataset DOI: 10.5061/dryad.sn02v6xh8

File list

Algebra_Equation_Solving_Performance_Using_Hands-On_Equations_Manipulatives_(Grades_6-8).csv

LD_Study_SAV_File.sav

Borenson_LD_Study_Output_File.pdf

File descriptions

Algebra_Equation_Solving_Performance_Using_Hands-On_Equations_Manipulatives_(Grades_6-8).csv– the data in unformatted form.

LD_Study_SAV_File.sav– the data for direct use in SPSS.

Borenson_LD_Study_Output_File.pdf- results from statistical analysis (see Output file description under Usage Notes for further information).

Usage Notes

Datafile Description

Variables/Columns:

Nr: Anonymized unique identifier for each student.

pretest: Score (0-6 points) on the initial algebra assessment administered before instruction.

posttestM: Score (0-6 points) on the...
Z
MLFMF: Data Sets for Machine Learning for Mathematical Formalization
data-staging.niaid.nih.gov
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
University of Ljubljana
Institute of Mathematics, Physics, and Mechanics
Authors
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLFMF MLFMF (Machine Learning for Mathematical Formalization) is a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. With more than 250,000 entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format. In addition to benchmarking the recommendation systems, the data sets can also be used for benchmarking node classification and link prediction algorithms. The four data sets Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes

the largest Lean 4 library Mathlib, the three largest Agda libraries:

the standard library the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the syntax trees give complete and easily parsed information about each entry. The Lean library data set was obtained by converting .olean files into s-expressions (see the lean2sexp tool). The Agda data sets were obtained with an s-expression extension of the official Agda repository (use either master-sexp or release-2.6.3-sexp branch). For more details, see our arXiv copy of the paper. Directory structure First, the mlfmf.zip archive needs to be unzipped. It contains a separate directory for every library (for example, the standard library of Agda can be found in the stdlib directory) and some auxiliary files. Every library directory contains

the network file from which the heterogeneous network can be loaded, a zip of the entries directory that contains (many) files with abstract syntax trees. Each of those files describes a single entry of the library. In addition to the auxiliary files which are used for loading the data (and described below), the zipped sources of lean2sexp and Agda s-expression extension are present. Loading the data In addition to the data files, there is also a simple python script main.py for loading the data. To run it, you will have to install the packages listed in the file requirements.txt: tqdm and networkx. The easiest way to do so is calling pip install -r requirements.txt. When running main.py for the first time, the script will unzip the entry files into the directory named entries. After that, the script loads the syntax trees of the entries (see the Entry class) and the network (as networkx.MultiDiGraph object). Note. The entry files have extension .dag (directed acyclic graph), since Lean uses node sharing, which breaks the tree structure (a shared node has more than one parent node). More information For more information about the data collection process, detailed data (and data format) description, and baseline experiments that were already performed with these data, see our arXiv copy of the paper. For the code that was used to perform the experiments and data format description, visit our github repository https://github.com/ul-fmf/mlfmf-data. Funding Since not all the funders are available in the Zenodo's database, we list them here:

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-21-1-0024. The authors also acknowledge the financial support of the Slovenian Research Agency via the research core funding No. P2-0103 and No. P1-0294.
d
11th Grade Math Proficiency Rate
catalog.data.gov
s.cnmilf.com
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.iowa.gov (2023). 11th Grade Math Proficiency Rate [Dataset]. https://catalog.data.gov/dataset/11th-grade-math-proficiency-rate
Explore at:
Dataset updated
Sep 1, 2023
Dataset provided by
data.iowa.gov
Description
The percentage of 11th grade Iowa students tested who met standard math score metric associated with the grade and content.

Facebook

Twitter

Click to copy link

Link copied

Cite

Awsaf (2024). Math Dataset [Dataset]. https://www.kaggle.com/datasets/awsaf49/math-dataset

Math Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

Explore at:

zip(7412179 bytes)Available download formats

Dataset updated

Mar 12, 2024

Authors

Awsaf

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset

This dataset was created by Awsaf

Released under MIT

Reference: https://github.com/hendrycks/math/

Clear search

Close search

Google apps

Main menu

Math Dataset

Dataset

Contents

Airoboros LLMs Math Dataset

Airoboros LLMs Math Dataset

Mastering Complex Mathematical Operations in Machine Learning

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

math-preference-dataset

MathInstruct Dataset: Hybrid Math Instruction

MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

About this dataset

How to use the dataset

Research Ideas

Data from: MLFMF: Data Sets for Machine Learning for Mathematical...

Math Test Results 2013-2023

small-open-web-math-dataset

Small Open Web Math Dataset

Dataset of books called Math for all. Participant book grades K-2

ThirdGrade ELA Math Scores byTract 08032017

Mathematical Problems Dataset: Various

Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

SAND-MATH

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...

Dataset for Does High Mathematical Flexibility Correlate with Enhanced...

math-writing-dataset-google

Data from: Dataset of the study: "Chatbots put to the test in math and logic...

Dataset of Mathematical Terminology and words with POS tags

Mathematics Dataset

Algebra equation solving performance by LD and non-LD students using...

Algebra equation solving performance using hands-on equations for students with learning disabilities and their peers (grades 6-8)

File list

File descriptions

Usage Notes

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

11th Grade Math Proficiency Rate

Math Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

Dataset

Contents