24 datasets found

MetaMath QA
kaggle.com
zip
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
Explore at:
zip(78629842 bytes)Available download formats
Dataset updated
Nov 23, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.

Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.

Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
m
UMath - Dataset of Mathematical Terminology
data.mendeley.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davlatyor Mengliev (2025). UMath - Dataset of Mathematical Terminology [Dataset]. http://doi.org/10.17632/6m76ftmyv9.3
Explore at:
Unique identifier
https://doi.org/10.17632/6m76ftmyv9.3
Dataset updated
Jun 25, 2025
Authors
Davlatyor Mengliev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As part of the study, a dataset of mathematical words, phrases and terminology in the Uzbek language was formed. 1) This dataset contains 858 unique words and terms in mathematics. 2) A distinctive feature of the dataset is that the words and terms in it have a weighting coefficient for each of the five mathematical areas (Discrete Mathematics, Geometry, Probability Theory, Differential Equations, Higher Mathematics). 3) The penultimate column of the dataset contains the English translation of this word. 4) The pre-last (8th) column of the dataset contains information on the part of speech to which this word belongs. 5) The last column contains information about source where the each term was extracted

Upd: Current version has English version headers.
GSM8K - Grade School Math 8K Q&A
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
Explore at:
zip(3418660 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

By Huggingface Hub [source]

About this dataset

This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

Research Ideas

Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.

Generating new grade school math questions and answers using g...
D
Comparative Judgement of Statements About Mathematical Definitions
dataverse.no
dataverse.azure.uit.no
csv, txt
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
Explore at:
csv(43566), csv(2523), csv(37503), txt(3623)Available download formats
Unique identifier
https://doi.org/10.18710/EOZKTR
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.
m
Dataset of Mathematical Terminology and words with POS tags
data.mendeley.com
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davlatyor Mengliev (2025). Dataset of Mathematical Terminology and words with POS tags [Dataset]. http://doi.org/10.17632/5s5b9mjwbh.2
Explore at:
Unique identifier
https://doi.org/10.17632/5s5b9mjwbh.2
Dataset updated
Jun 5, 2025
Authors
Davlatyor Mengliev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As part of the study, a dataset of mathematical words, phrases and terminology in the Uzbek language was formed. 1) This dataset contains 858 unique words and terms in mathematics. 2) A distinctive feature of the dataset is that the words and terms in it have a weighting coefficient for each of the five mathematical areas (Discrete Mathematics, Geometry, Probability Theory, Differential Equations, Higher Mathematics). 3) The penultimate column of the dataset contains the English translation of this word. 4) The last column of the dataset contains information on the part of speech to which this word belongs.
h
Data from: MathCheck
huggingface.co
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PremiLab-Math (2024). MathCheck [Dataset]. https://huggingface.co/datasets/PremiLab-Math/MathCheck
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2024
Dataset authored and provided by
PremiLab-Math
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a substantial risk of model overfitting and fails to accurately represent genuine mathematical… See the full description on the dataset page: https://huggingface.co/datasets/PremiLab-Math/MathCheck.
y
% of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new...
data.yorkopendata.org
ckan.york.staging.datopian.com
+3more
Updated Mar 18, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new First Entry definition) - (Snapshot) [Dataset]. https://data.yorkopendata.org/dataset/kpi-75a
Explore at:
Dataset updated
Mar 18, 2015
License
Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Description
% of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new First Entry definition) - (Snapshot) *This indicator has been discontinued due to national changes in GCSEs in 2016.
Z
SCG Dataset from Graph Neural Networks in Supply Chain Analytics and...
data.niaid.nih.gov
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasi, Azmine Toushik; Islam, MD Shafikul; Akib, Adipto Raihan; Bappy, Mahathir Mohammad (2024). SCG Dataset from Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13652825
Explore at:
Dataset updated
Sep 3, 2024
Dataset provided by
Louisiana State University
Shahjalal University of Science and Technology
Authors
Wasi, Azmine Toushik; Islam, MD Shafikul; Akib, Adipto Raihan; Bappy, Mahathir Mohammad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: Graph Neural Networks (GNNs) have recently gained traction in transportation, bioinformatics, language and image processing, but research on their application to supply chain management remains limited. Supply chains are inherently graph-like, making them ideal for GNN methodologies, which can optimize and solve complex problems. The barriers include a lack of proper conceptual foundations, familiarity with graph applications in SCM, and real-world benchmark datasets for GNN-based supply chain research. To address this, we discuss and connect supply chains with graph structures for effective GNN application, providing detailed formulations, examples, mathematical definitions, and task guidelines. Additionally, we present a multi-perspective real-world benchmark dataset from a leading FMCG company in Bangladesh, focusing on supply chain planning. We discuss various supply chain tasks using GNNs and benchmark several state-of-the-art models on homogeneous and heterogeneous graphs across six supply chain analytics tasks. Our analysis shows that GNN-based models consistently outperform statistical ML and other deep learning models by around 10-30% in regression, 10-30% in classification and detection tasks, and 15-40% in anomaly detection tasks on designated metrics. With this work, we lay the groundwork for solving supply chain problems using GNNs, supported by conceptual discussions, methodological insights, and a comprehensive dataset.
HWRT database of handwritten symbols
zenodo.org
data.niaid.nih.gov
tar
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Thoma; Martin Thoma (2020). HWRT database of handwritten symbols [Dataset]. http://doi.org/10.5281/zenodo.50022
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.50022
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Thoma; Martin Thoma
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.

train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!
Z
Empathy dataset
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathematical Research Data Initiative (2024). Empathy dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7683906
Explore at:
Dataset updated
Dec 18, 2024
Authors
Mathematical Research Data Initiative
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

Size: A dataset of size 1973*28

Number of features: 28

Ground truth: No

Type of Graph: Mixed graph

The following gives the description of the variables:

Feature FeatureLabel Domain Item meaning from Davis 1980

001 1FS Green I daydream and fantasize, with some regularity, about things that might happen to me.

002 2EC Purple I often have tender, concerned feelings for people less fortunate than me.

003 3PT_R Yellow I sometimes find it difficult to see things from the “other guy’s” point of view.

004 4EC_R Purple Sometimes I don’t feel very sorry for other people when they are having problems.

005 5FS Green I really get involved with the feelings of the characters in a novel.

006 6PD Red In emergency situations, I feel apprehensive and ill-at-ease.

007 7FS_R Green I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)

008 8PT Yellow I try to look at everybody’s side of a disagreement before I make a decision.

009 9EC Purple When I see someone being taken advantage of, I feel kind of protective towards them.

010 10PD Red I sometimes feel helpless when I am in the middle of a very emotional situation.

011 11PT Yellow sometimes try to understand my friends better by imagining how things look from their perspective

012 12FS_R Green Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)

013 13PD_R Red When I see someone get hurt, I tend to remain calm. (Reversed)

014 14EC_R Purple Other people’s misfortunes do not usually disturb me a great deal. (Reversed)

015 15PT_R Yellow If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)

016 16FS Green After seeing a play or movie, I have felt as though I were one of the characters.

017 17PD Red Being in a tense emotional situation scares me.

018 18EC_R Purple When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)

019 19PD_R Red I am usually pretty effective in dealing with emergencies. (Reversed)

020 20FS Green I am often quite touched by things that I see happen.

021 21PT Yellow I believe that there are two sides to every question and try to look at them both.

022 22EC Purple I would describe myself as a pretty soft-hearted person.

023 23FS Green When I watch a good movie, I can very easily put myself in the place of a leading character.

024 24PD Red I tend to lose control during emergencies.

025 25PT Yellow When I’m upset at someone, I usually try to “put myself in his shoes” for a while.

026 26FS Green When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.

027 27PD Red When I see someone who badly needs help in an emergency, I go to pieces.

028 28PT Yellow Before criticizing somebody, I try to imagine how I would feel if I were in their place

More information about the dataset is contained in empathy_description.html file.
d
Data from: A mathematical model of optimized radioiodine-131 therapy of...
catalog.data.gov
odgavaprod.ogopendata.com
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). A mathematical model of optimized radioiodine-131 therapy of Graves' hyperthyroidism [Dataset]. https://catalog.data.gov/dataset/a-mathematical-model-of-optimized-radioiodine-131-therapy-of-graves-hyperthyroidism
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Background The current status of radioiodine-131 (RaI) dosimetry for Graves' hyperthyroidism is not clear. Recurrent hyperthyroidism and iatrogenic hypothyroidism are two problems which interact such that trying to solve one leads to exacerbation of the other. Optimized RaI therapy has therefore begun to be defined just in terms of early hypothyroidism (ablative therapy) as physicians have given up on reducing hypothyroidism. Methods Optimized therapy is evaluated both in terms of the greatest separation of cure rate from hypothyroidism rate (non-ablative therapy) or in terms of early hypothyroidism (ablative therapy) by mathematical modeling of outcome after radioiodine and critically discussing the three common methods of RaI dosing for Graves' disease. Results Cure follows a logarithmic relationship to activity administered or absorbed dose, while hypothyroidism follows a linear relationship. The effect of including or omitting factors in the calculation of the administered I–131 activity such as the measured thyroid uptake and effective half-life of RaI or giving extra compensation for gland size is discussed. Conclusions Very little benefit can be gained by employing complicated methods of RaI dose selection for non-ablative therapy since the standard activity model shows the best potential for cure and prolonged euthyroidism. For ablative therapy, a standard MBq/g dosing provides the best outcome in terms of cure and early hypothyroidism.
Math StackExchange RepHistory
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey (2025). Math StackExchange RepHistory [Dataset]. https://www.kaggle.com/datasets/andreyvm/math-stackexchange-rephistory
Explore at:
zip(337231010 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Andrey
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Reputation history for Users on Mathematics StackExchange site. See the source Kaggle Notebook for further details.

This dataset has been produced for research purposes using "Stack Exchange Data Dump", Mathematics site (obtained from https://math.stackexchange.com/, see also https://math.stackexchange.com/help/data-dumps) and other datasets obtained from "SEDE" (StackExchange Data Explorer, https://data.stackexchange.com/).

Licence information

This dataset is licensed under Creative Commons CC BY-SA licence 4.0.

User contributions to StackExchange sites are licensed under the Creative Commons CC BY-SA licence (2.5, 3.0, and/or 4.0). For the text of the licence(s) please follow the links at https://stackoverflow.com/help/licensing, which is quoted below:

As noted in the Stack Exchange Terms of Service and in the footer of every page, all publicly accessible user contributions are licensed under Creative Commons Attribution-ShareAlike license as follows: - Content contributed before 2011-04-08 (UTC) is distributed under the terms of CC BY-SA 2.5. - Content contributed from 2011-04-08 up to but not including 2018-05-02 (UTC) is distributed under the terms of CC BY-SA 3.0. - Content contributed on or after 2018-05-02 (UTC) is distributed under the terms of CC BY-SA 4.0. The license applicable for each Question and Answer revision is available on the post timeline. See this post for more information. Please read the terms of service and the full legal text of the license carefully for more details on how your content can be used and for how you can use publicly accessible content contributed to the site by other users.

Related notebooks and datasets

https://www.kaggle.com/code/andreyvm/mathse-download-dump-from-se

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-dump-raw

https://www.kaggle.com/code/andreyvm/mathse-dump-to-parquet

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-dump-parquet

https://www.kaggle.com/code/andreyvm/mathse-recover-rephistory

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-rephistory

https://www.kaggle.com/code/andreyvm/mathse-get-usersforbonus-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-usersforbonus

https://www.kaggle.com/code/andreyvm/mathse-get-postsdeleted-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-postsdeleted

https://www.kaggle.com/code/andreyvm/mathse-get-misc-data-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-misc
Math StackExchange PostsDeleted
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey (2025). Math StackExchange PostsDeleted [Dataset]. https://www.kaggle.com/datasets/andreyvm/math-stackexchange-postsdeleted
Explore at:
zip(25749091 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Andrey
Description
PostsDeleted is the "deleted" part of PostsWithDeleted table.

This dataset has been obtained from StackExchange Data Explorer (SEDE) https://data.stackexchange.com/ and is used for research purposes. See the source Kaggle Notebook for further details.

Licence information

User contributions to StackExchange sites are licensed under the Creative Commons CC BY-SA licence (2.5, 3.0, and/or 4.0). For the text of the licence(s) please follow the links at https://stackoverflow.com/help/licensing, which is quoted below:

As noted in the Stack Exchange Terms of Service and in the footer of every page, all publicly accessible user contributions are licensed under Creative Commons Attribution-ShareAlike license as follows: - Content contributed before 2011-04-08 (UTC) is distributed under the terms of CC BY-SA 2.5. - Content contributed from 2011-04-08 up to but not including 2018-05-02 (UTC) is distributed under the terms of CC BY-SA 3.0. - Content contributed on or after 2018-05-02 (UTC) is distributed under the terms of CC BY-SA 4.0. The license applicable for each Question and Answer revision is available on the post timeline. See this post for more information. Please read the terms of service and the full legal text of the license carefully for more details on how your content can be used and for how you can use publicly accessible content contributed to the site by other users.

Related notebooks and datasets

https://www.kaggle.com/code/andreyvm/mathse-download-dump-from-se

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-dump-raw

https://www.kaggle.com/code/andreyvm/mathse-dump-to-parquet

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-dump-parquet

https://www.kaggle.com/code/andreyvm/mathse-recover-rephistory

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-rephistory

https://www.kaggle.com/code/andreyvm/mathse-get-usersforbonus-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-usersforbonus

https://www.kaggle.com/code/andreyvm/mathse-get-postsdeleted-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-postsdeleted

https://www.kaggle.com/code/andreyvm/mathse-get-misc-data-from-sede

https://www.kaggle.com/datasets/andreyvm/math-stackexchange-misc
MaxTex: A Large Scale Benchmark Dataset for Mathematical Formula Recognition...
figshare.com
application/x-rar
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Runqing Yan (2025). MaxTex: A Large Scale Benchmark Dataset for Mathematical Formula Recognition [Dataset]. http://doi.org/10.6084/m9.figshare.27321012.v2
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27321012.v2
Dataset updated
Apr 14, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Runqing Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mathematical formula recognition is an important component of document understanding and has broad application value in academic literature processing and intelligent education However, existing research mainly focuses on improving the model architecture to enhance the recognition performance of relatively simple formulas, ignoring the limitations of existing benchmark datasets in terms of scale, quality, and diversity, which limits the development of complex formula recognition technology This article has made two key contributions. Firstly, high-quality printed MaxTex (P) and handwritten scanned MaxTex (H) datasets have been constructed MaxTex (P) contains 223000 samples and avoids symbol redundancy by adopting a unified and efficient morpheme design; Although MaxTex (H) has a moderate scale, it optimizes the morpheme space and covers complex mathematical expressions At the same time, these two datasets have been strictly controlled in terms of sample size, data quality, and annotation accuracy, providing a more reliable benchmark for model training and evaluation Secondly, an innovative character sequence encoding and decoding scheme was designed to solve the problems of missing spaces in existing LaTeX label sequences and dictionary inflation caused by BPE encoding and decoding, while preserving the semantic information of the original character sequenceTo obtain MaxTex (H), please contact us gxqyrq@gmail.com
m
Data from: Dataset of Student Level Prediction in UAE
data.mendeley.com
Updated Dec 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
Explore at:
Unique identifier
https://doi.org/10.17632/3g8dtwbjjy.1
Dataset updated
Dec 18, 2020
Authors
shatha Ghareeb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Arab Emirates
Description
The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.
u
Unit process data for field crop production version 1.1
agdatacommons.nal.usda.gov
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joyce Cooper (2025). Unit process data for field crop production version 1.1 [Dataset]. http://doi.org/10.15482/USDA.ADC/1226081
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1226081
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Joyce Cooper
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The release of the LCA Commons Unit Process Data: field crop production Version 1.1 includes the following updates:Added meta data to reflect USDA LCA Digital Commons data submission guidance including descriptions of the process (reference to which the size of the inputs and outputs in the process relate, description of the process and technical scope and any aggregation; definition of the technology being used, its operating conditions); temporal representatives; geographic representativeness; allocation methods; process type (U: unit process, S: system process); treatment of missing intermediate flow data; treatment of missing flow data to or from the environment; intermediate flow data sources; mass balance; data treatment (description of the methods and assumptions used to transform primary and secondary data into flow quantities through recalculating, reformatting, aggregation, or proxy data and a description of data quality according to LCADC convention); sampling procedures; and review details. Also, dataset documentation and related archival publications are cited in the APA format.Changed intermediate flow categories and subcategories to reflect the ISIC International Standard Industrial Classification (ISIC).Added “US-” to the US state abbreviations for intermediate flow locations.Corrected the ISIC code for “CUTOFF domestic barge transport; average fuel” (changed to ISIC 5022: Inland freight water transport).Corrected flow names as follows: "Propachlor" renamed "Atrazine". “Bromoxynil octanoate” renamed “Bromoxynil heptanoate”. “water; plant uptake; biogenic” renamed “water; from plant uptake; biogenic” half the instances of “Benzene, pentachloronitro-“ replaced with Etridiazole and half with “Quintozene”. “CUTOFF phosphatic fertilizer, superphos. grades 22% & under; at point-of-sale” replaced with “CUTOFF phosphatic fertilizer, superphos. grades 22% and under; at point-of-sale”.Corrected flow values for “water; from plant uptake; biogenic” and “dry matter except CNPK; from plant uptake; biogenic” in some datasets.Presented data in the International Reference Life Cycle Data System (ILCD)1 format, allowing the parameterization of raw data and mathematical relations to be presented within the datasets and the inclusion of parameter uncertainty data. Note that ILCD formatted data can be converted to the ecospold v1 format using the OpenLCA software.Data quality rankings have been updated to reflect the inclusion of uncertainty data in the ILCD formatted data.Changed all parameter names to “pxxxx” to accommodate mathematical relation character limitations in OpenLCA. Also adjusted select mathematical relations to recognize zero entries. The revised list of parameter names is provided in the documentation attached.Resources in this dataset:Resource Title: Cooper-crop-production-data-parameterization-version-1.1 .File Name: Cooper-crop-production-data-parameterization-version-1.1.xlsxResource Description: Description of parameters that define the Cooper Unit process data for field crop production version 1.1Resource Title: Cooper_Crop_Data_v1.1_ILCD.File Name: Cooper_Crop_Data_v1.1_ILCD.zipResource Description: .zip archive of ILCD xml files that comprise crop production unit process modelsResource Software Recommended: openLCA,url: http://www.openlca.org/Resource Title: Summary of Revisions of the LCA Digital Commons Unit Process Data: field crop production for version 1.1 (August 2013).File Name: Summary of Revisions of the LCA Digital Commons Unit Process Data- field crop production, Version 1.1 (August 2013).pdfResource Description: Documentation of revisions to version 1 data that constitute version 1.1
f
Definitions of mathematical notation used in this paper.
figshare.com
xls
Updated Oct 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen (2025). Definitions of mathematical notation used in this paper. [Dataset]. http://doi.org/10.1371/journal.pone.0335221.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0335221.t002
Dataset updated
Oct 31, 2025
Dataset provided by
PLOS ONE
Authors
Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Definitions of mathematical notation used in this paper.
d
Data from: Do humans optimally exploit redundancy to control step...
search.dataone.org
datasetcatalog.nlm.nih.gov
+3more
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan B. Dingwell; Joby John; Joseph P. Cusumano (2025). Do humans optimally exploit redundancy to control step variability in walking? [Dataset]. http://doi.org/10.5061/dryad.sk55m
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.sk55m
Dataset updated
Jun 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Jonathan B. Dingwell; Joby John; Joseph P. Cusumano
Time period covered
Jan 1, 2015
Description
It is widely accepted that humans and animals minimize energetic cost while walking. While such principles predict average behavior, they do not explain the variability observed in walking. For robust performance, walking movements must adapt at each step, not just on average. Here, we propose an analytical framework that reconciles issues of optimality, redundancy, and stochasticity. For human treadmill walking, we defined a goal function to formulate a precise mathematical definition of one possible control strategy: maintain constant speed at each stride. We recorded stride times and stride lengths from healthy subjects walking at five speeds. The specified goal function yielded a decomposition of stride-to-stride variations into new gait variables explicitly related to achieving the hypothesized strategy. Subjects exhibited greatly decreased variability for goal-relevant gait fluctuations directly related to achieving this strategy, but far greater variability for goal-irrelevant fl...
f
Definition of variables in the model.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jan 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Chao; Li, Xiahui; Li, Shuai; Wang, Zhe; Zhu, Xianming; Long, Haonan (2024). Definition of variables in the model. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001450239
Explore at:
Dataset updated
Jan 25, 2024
Authors
Zhang, Chao; Li, Xiahui; Li, Shuai; Wang, Zhe; Zhu, Xianming; Long, Haonan
Description
An ultrasonic phased array defect extraction method based on adaptive region growth is proposed, aiming at problems such as difficulty in defect identification and extraction caused by noise interference and complex structure of the detected object during ultrasonic phased array detection. First, bilateral filtering and grayscale processing techniques are employed for the purpose of noise reduction and initial data processing. Following this, the maximum sound pressure within the designated focusing region serves as the seed point. An adaptive region iteration method is subsequently employed to execute automatic threshold capture and region growth. In addition, mathematical morphology is applied to extract the processed defect features. In the final stage, two sets of B-scan images depicting hole defects of varying sizes are utilized for experimental validation of the proposed algorithm’s effectiveness and applicability. The defect features extracted through this algorithm are then compared and analyzed alongside the histogram threshold method, Otsu method, K-means clustering algorithm, and a modified iterative method. The results reveal that the margin of error between the measured results and the actual defect sizes is less than 13%, representing a significant enhancement in the precision of defect feature extraction. Consequently, this method establishes a dependable foundation of data for subsequent tasks, such as defect localization and quantitative and qualitative analysis.
Hex Dictionary V2
kaggle.com
zip
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DigitalEuan (2025). Hex Dictionary V2 [Dataset]. https://www.kaggle.com/datasets/digitaleuan/hex-dictionary-v2
Explore at:
zip(203686 bytes)Available download formats
Dataset updated
May 21, 2025
Authors
DigitalEuan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
READ ME

Welcome to the Universal Binary Principle (UBP) Dictionary System - Version 2

Author: Euan Craig, New Zealand 2025

Embark on a revolutionary journey with Version 2 of the UBP Dictionary System, a cutting-edge Python notebook that redefines how words are stored, analyzed, and visualized! Built for Kaggle, this system encodes words as multidimensional hexagonal structures in custom .hexubp files, leveraging sophisticated mathematics to integrate binary toggles, resonance frequencies, spatial coordinates, and more, all rooted in the Universal Binary Principle (UBP). This is not just a dictionary—it’s a paradigm shift in linguistic representation.

What is the UBP Dictionary System? The UBP Dictionary System transforms words into rich, vectorized representations stored in custom .hexubp files—a JSON-based format designed to encapsulate a word’s multidimensional UBP properties. Each .hexubp file represents a word as a hexagonal structure with 12 vertices, encoding: * Binary Toggles: 6-bit patterns capturing word characteristics. * Resonance Frequencies: Derived from the Schumann resonance (7.83 Hz) and UBP Pi (~2.427). * Spatial Vectors: 6D coordinates positioning words in a conceptual “Bitfield.” * Cultural and Harmonic Data: Contextual weights, waveforms, and harmonic properties.

These .hexubp files are generated, managed, and visualized through an interactive Tkinter-based interface, making the system a powerful tool for exploring language through a mathematical lens.

Unique Mathematical Foundation The UBP Dictionary System is distinguished by its deep reliance on mathematics to model language: * UBP Pi (~2.427): A custom constant derived from hexagonal geometry and resonance alignment (calculated as 6/2 * cos(2π * 7.83 * 0.318309886)), serving as the system’s foundational reference. * Resonance Frequencies: Frequencies are computed using word-specific hashes modulated by UBP Pi, with validation against the Schumann resonance (7.83 Hz ± 0.078 Hz), grounding the system in physical phenomena. * 6D Spatial Vectors: Words are positioned in a 6D Bitfield (x, y, z, time, phase, quantum state) based on toggle sums and frequency offsets, enabling spatial analysis of linguistic relationships. * GLR Validation: A non-corrective validation mechanism flags outliers in binary, frequency, and spatial data, ensuring mathematical integrity without compromising creativity.

This mathematical rigor sets the system apart from traditional dictionaries, offering a framework where words are not just strings but dynamic entities with quantifiable properties. It’s a fusion of linguistics, physics, and computational theory, inviting users to rethink language as a multidimensional phenomenon.

Comparison with Other Data Storage Mechanisms The .hexubp format is uniquely tailored for UBP’s multidimensional model. Here’s how it compares to other storage mechanisms, with metrics to highlight its strengths: CSV/JSON (Traditional Dictionaries): * Structure: Flat key-value pairs (e.g., word:definition). * Storage: ~100 bytes per word for simple text (e.g., “and”:“conjunction”). * Query Speed: O(1) for lookups, but no support for vector operations. * Limitations: Lacks multidimensional data (e.g., spatial vectors, frequencies). * .hexubp Advantage: Stores 12 vertices with vectors (~1-2 KB per word), enabling complex analyses like spatial clustering or frequency drift detection.

Relational Databases (SQL): * Structure: Tabular, with columns for word, definition, etc. * Storage: ~200-500 bytes per word, plus index overhead. * Query Speed: O(log n) for indexed queries, slower for vector computations. * Limitations: Rigid schema, inefficient for 6D vectors or dynamic vertices. * .hexubp Advantage: Lightweight, file-based (~1-2 KB per word), with JSON flexibility for UBP’s hexagonal model, no database server required.

Vector Databases (e.g., Word2Vec): * Structure: Fixed-dimension vectors (e.g., 300D for semantic embeddings). * Storage: ~2.4 KB per word (300 floats at 8 bytes each). * Query Speed: O(n) for similarity searches, optimized with indexing. * Limitations: Generic embeddings lack UBP-specific dimensions (e.g., resonance, toggles). * .hexubp Advantage: Smaller footprint (~1-2 KB), with domain-specific dimensions tailored to UBP’s theoretical framework.

Graph Databases: * Structure: Nodes and edges for word relationships. * Storage: ~500 bytes per word, plus edge overhead. * Query Speed: O(k) for traversals, where k is edge count. * Limitations: Overkill for dictionary tasks, complex setup. * .hexubp Advantage: Self-contained hexagonal structure per word, simpler for UBP’s needs, with comparable storage (~1-2 KB).

The .hexubp format balances storage efficiency, flexibility, and UBP-s...

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b

MetaMath QA

Mathematical Questions for Large Language Models

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(78629842 bytes)Available download formats

Dataset updated

Nov 23, 2023

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.

Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.

Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

Clear search

Close search

Google apps

Main menu

MetaMath QA

MetaMath QA

Mathematical Questions for Large Language Models

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Data Dictionary

Preparing data for analysis

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

UMath - Dataset of Mathematical Terminology

GSM8K - Grade School Math 8K Q&A

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Comparative Judgement of Statements About Mathematical Definitions

Dataset of Mathematical Terminology and words with POS tags

Data from: MathCheck

% of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new...

SCG Dataset from Graph Neural Networks in Supply Chain Analytics and...

HWRT database of handwritten symbols

Empathy dataset

Data from: A mathematical model of optimized radioiodine-131 therapy of...

Math StackExchange RepHistory

Licence information

Related notebooks and datasets

Math StackExchange PostsDeleted

Licence information

Related notebooks and datasets

MaxTex: A Large Scale Benchmark Dataset for Mathematical Formula Recognition...

Data from: Dataset of Student Level Prediction in UAE

Unit process data for field crop production version 1.1

Definitions of mathematical notation used in this paper.

Data from: Do humans optimally exploit redundancy to control step...

Definition of variables in the model.

Hex Dictionary V2

MetaMath QA

Mathematical Questions for Large Language Models

MetaMath QA

Mathematical Questions for Large Language Models

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Data Dictionary

Preparing data for analysis

Research Ideas

Acknowledgements

License

Columns

Acknowledgements