11 datasets found

w
Dataset of books called The concise Oxford dictionary of mathematics
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The concise Oxford dictionary of mathematics [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+concise+Oxford+dictionary+of+mathematics
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 5 rows and is filtered where the book is The concise Oxford dictionary of mathematics. It features 7 columns including author, publication date, language, and book publisher.
MetaMath QA
kaggle.com
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.

Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.

Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
D
Comparative Judgement of Statements About Mathematical Definitions
dataverse.no
dataverse.azure.uit.no
csv, txt
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
Explore at:
txt(3623), csv(2523), csv(37503), csv(43566)Available download formats
Unique identifier
https://doi.org/10.18710/EOZKTR
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.
r
Amazon Web Services Public Data Sets
rrid.site
dknet.org
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Amazon Web Services Public Data Sets [Dataset]. http://identifiers.org/RRID:SCR_006318
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006318
Dataset updated
Jan 29, 2022
Description
A multidisciplinary repository of public data sets such as the Human Genome and US Census data that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community. Anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. If you have a public domain or non-proprietary data set that you think is useful and interesting to the AWS community, please submit a request and the AWS team will review your submission and get back to you. Typically the data sets in the repository are between 1 GB to 1 TB in size (based on the Amazon EBS volume limit), but they can work with you to host larger data sets as well. You must have the right to make the data freely available.
P
NaturalProofs Dataset
paperswithcode.com
opendatalab.com
+2more
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho (2025). NaturalProofs Dataset [Dataset]. https://paperswithcode.com/dataset/naturalproofs
Explore at:
Dataset updated
May 28, 2025
Authors
Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho
Description
The NaturalProofs Dataset is a large-scale dataset for studying mathematical reasoning in natural language. NaturalProofs consists of roughly 20,000 theorem statements and proofs, 12,500 definitions, and 1,000 additional pages (e.g. axioms, corollaries) derived from ProofWiki, an online compendium of mathematical proofs written by a community of contributors.
f
Confusion matrix of K-Means clustering results on dataset 6.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaobin Huang; Yuan Cheng; Dapeng Lang; Ronghua Chi; Guofeng Liu (2023). Confusion matrix of K-Means clustering results on dataset 6. [Dataset]. http://doi.org/10.1371/journal.pone.0090109.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0090109.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Shaobin Huang; Yuan Cheng; Dapeng Lang; Ronghua Chi; Guofeng Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Confusion matrix of K-Means clustering results on dataset 6.
StudentMathScores
kaggle.com
Updated Jun 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Logan Henslee (2019). StudentMathScores [Dataset]. https://www.kaggle.com/loganhenslee/studentmathscores/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 10, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Logan Henslee
Description
CONTEXT

Practice Scenario: The UIW School of Engineering wants to recruit more students into their program. They will recruit students with great math scores. Also, to increase the chances of recruitment, the department will look for students who qualify for financial aid. Students who qualify for financial aid more than likely come from low socio-economic backgrounds. One way to indicate this is to view how much federal revenue a school district receives through its state. High federal revenue for a school indicates that a large portion of the student base comes from low incomes families.

The question we wish to ask is as follows: Name the school districts across the nation where their Child Nutrition Programs(c25) are federally funded between the amounts $30,000 and $50,000. And where the average math score for the school districts corresponding state is greater than or equal to the nations average score of 282.

The SQL query below in 'Top5MathTarget.sql' can be used to answer this question in MySQL. To execute this process, one would need to install MySQL to their local system and load the attached datasets below from Kaggle into their MySQL schema. The SQL query below will then join the separate tables on various key identifiers.

DATA SOURCE Data is sourced from The U.S Census Bureau and The Nations Report Card (using the NAEP Data Explorer).

Finance: https://www.census.gov/programs-surveys/school-finances/data/tables.html

Math Scores: https://www.nationsreportcard.gov/ndecore/xplore/NDE

COLUMN NOTES

All data comes from the school year 2017. Individual schools are not represented, only school districts within each state.

FEDERAL FINANCE DATA DEFINITIONS

t_fed_rev: Total federal revenue through the state to each school district.

C14- Federal revenue through the state- Title 1 (no child left behind act).

C25- Federal revenue through the state- Child Nutrition Act.

Title 1 is a program implemented in schools to help raise academic achievement for all students. The program is available to schools where at least 40% of the students come from low inccome families.

Child Nutrition Programs ensure the children are getting the food they need to grow and learn. Schools with high federal revenue to these programs indicate students that also come from low income families.

MATH SCORES DATA DEFINITIONS

Note: Mathematics, Grade 8, 2017, All Students (Total)

average_scale_score - The state's average score for eighth graders taking the NAEP math exam.
f
Dataset statistics after preprocessing.
plos.figshare.com
xls
Updated Jun 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal (2024). Dataset statistics after preprocessing. [Dataset]. http://doi.org/10.1371/journal.pone.0303105.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303105.t002
Dataset updated
Jun 13, 2024
Dataset provided by
PLOS ONE
Authors
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In scientific research, assessing the impact and influence of authors is crucial for evaluating their scholarly contributions. Whereas in literature, multitudinous parameters have been developed to quantify the productivity and significance of researchers, including the publication count, citation count, well-known h index and its extensions and variations. However, with a plethora of available assessment metrics, it is vital to identify and prioritize the most effective metrics. To address the complexity of this task, we employ a powerful deep learning technique known as the Multi-Layer Perceptron (MLP) classifier for the classification and the ranking purposes. By leveraging the MLP’s capacity to discern patterns within datasets, we assign importance scores to each parameter using the proposed modified recursive elimination technique. Based on the importance scores, we ranked these parameters. Furthermore, in this study, we put forth a comprehensive statistical analysis of the top-ranked author assessment parameters, encompassing a vast array of 64 distinct metrics. This analysis gives us treasured insights in between these parameters, shedding light on the potential correlations and dependencies that may affect assessment outcomes. In the statistical analysis, we combined these parameters by using seven well-known statistical methods, such as arithmetic means, harmonic means, geometric means etc. After combining the parameters, we sorted the list of each pair of parameters and analyzed the top 10, 50, and 100 records. During this analysis, we counted the occurrence of the award winners. For experimental proposes, data collection was done from the field of Mathematics. This dataset consists of 525 individuals who are yet to receive their awards along with 525 individuals who have been recognized as potential award winners by certain well known and prestigious scientific societies belonging to the fields’ of mathematics in the last three decades. The results of this study revealed that, in ranking of the author assessment parameters, the normalized h index achieved the highest importance score as compared to the remaining sixty-three parameters. Furthermore, the statistical analysis results revealed that the Trigonometric Mean (TM) outperformed the other six statistical models. Moreover, based on the analysis of the parameters, specifically the M Quotient and FG index, it is evident that combining these parameters with any other parameter using various statistical models consistently produces excellent results in terms of the percentage score for returning awardees.
P
MML Dataset
paperswithcode.com
Updated Jan 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt (2025). MML Dataset [Dataset]. https://paperswithcode.com/dataset/mmlu
Explore at:
Dataset updated
Jan 5, 2025
Authors
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt
Description
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.
f
Top 10 records analysis results.
plos.figshare.com
xls
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal (2024). Top 10 records analysis results. [Dataset]. http://doi.org/10.1371/journal.pone.0303105.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303105.t004
Dataset updated
Jun 13, 2024
Dataset provided by
PLOS ONE
Authors
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In scientific research, assessing the impact and influence of authors is crucial for evaluating their scholarly contributions. Whereas in literature, multitudinous parameters have been developed to quantify the productivity and significance of researchers, including the publication count, citation count, well-known h index and its extensions and variations. However, with a plethora of available assessment metrics, it is vital to identify and prioritize the most effective metrics. To address the complexity of this task, we employ a powerful deep learning technique known as the Multi-Layer Perceptron (MLP) classifier for the classification and the ranking purposes. By leveraging the MLP’s capacity to discern patterns within datasets, we assign importance scores to each parameter using the proposed modified recursive elimination technique. Based on the importance scores, we ranked these parameters. Furthermore, in this study, we put forth a comprehensive statistical analysis of the top-ranked author assessment parameters, encompassing a vast array of 64 distinct metrics. This analysis gives us treasured insights in between these parameters, shedding light on the potential correlations and dependencies that may affect assessment outcomes. In the statistical analysis, we combined these parameters by using seven well-known statistical methods, such as arithmetic means, harmonic means, geometric means etc. After combining the parameters, we sorted the list of each pair of parameters and analyzed the top 10, 50, and 100 records. During this analysis, we counted the occurrence of the award winners. For experimental proposes, data collection was done from the field of Mathematics. This dataset consists of 525 individuals who are yet to receive their awards along with 525 individuals who have been recognized as potential award winners by certain well known and prestigious scientific societies belonging to the fields’ of mathematics in the last three decades. The results of this study revealed that, in ranking of the author assessment parameters, the normalized h index achieved the highest importance score as compared to the remaining sixty-three parameters. Furthermore, the statistical analysis results revealed that the Trigonometric Mean (TM) outperformed the other six statistical models. Moreover, based on the analysis of the parameters, specifically the M Quotient and FG index, it is evident that combining these parameters with any other parameter using various statistical models consistently produces excellent results in terms of the percentage score for returning awardees.
f
Top 100 records analysis results.
figshare.com
xls
Updated Jun 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal (2024). Top 100 records analysis results. [Dataset]. http://doi.org/10.1371/journal.pone.0303105.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303105.t006
Dataset updated
Jun 13, 2024
Dataset provided by
PLOS ONE
Authors
Ghulam Mustafa; Abid Rauf; Muhammad Tanvir Afzal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In scientific research, assessing the impact and influence of authors is crucial for evaluating their scholarly contributions. Whereas in literature, multitudinous parameters have been developed to quantify the productivity and significance of researchers, including the publication count, citation count, well-known h index and its extensions and variations. However, with a plethora of available assessment metrics, it is vital to identify and prioritize the most effective metrics. To address the complexity of this task, we employ a powerful deep learning technique known as the Multi-Layer Perceptron (MLP) classifier for the classification and the ranking purposes. By leveraging the MLP’s capacity to discern patterns within datasets, we assign importance scores to each parameter using the proposed modified recursive elimination technique. Based on the importance scores, we ranked these parameters. Furthermore, in this study, we put forth a comprehensive statistical analysis of the top-ranked author assessment parameters, encompassing a vast array of 64 distinct metrics. This analysis gives us treasured insights in between these parameters, shedding light on the potential correlations and dependencies that may affect assessment outcomes. In the statistical analysis, we combined these parameters by using seven well-known statistical methods, such as arithmetic means, harmonic means, geometric means etc. After combining the parameters, we sorted the list of each pair of parameters and analyzed the top 10, 50, and 100 records. During this analysis, we counted the occurrence of the award winners. For experimental proposes, data collection was done from the field of Mathematics. This dataset consists of 525 individuals who are yet to receive their awards along with 525 individuals who have been recognized as potential award winners by certain well known and prestigious scientific societies belonging to the fields’ of mathematics in the last three decades. The results of this study revealed that, in ranking of the author assessment parameters, the normalized h index achieved the highest importance score as compared to the remaining sixty-three parameters. Furthermore, the statistical analysis results revealed that the Trigonometric Mean (TM) outperformed the other six statistical models. Moreover, based on the analysis of the parameters, specifically the M Quotient and FG index, it is evident that combining these parameters with any other parameter using various statistical models consistently produces excellent results in terms of the percentage score for returning awardees.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2025). Dataset of books called The concise Oxford dictionary of mathematics [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+concise+Oxford+dictionary+of+mathematics

Dataset of books called The concise Oxford dictionary of mathematics

Explore at:

Dataset updated

Apr 17, 2025

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about books. It has 5 rows and is filtered where the book is The concise Oxford dictionary of mathematics. It features 7 columns including author, publication date, language, and book publisher.

Clear search

Close search

Google apps

Main menu

Dataset of books called The concise Oxford dictionary of mathematics

MetaMath QA

MetaMath QA

Mathematical Questions for Large Language Models

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Data Dictionary

Preparing data for analysis

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Comparative Judgement of Statements About Mathematical Definitions

Amazon Web Services Public Data Sets

NaturalProofs Dataset

Confusion matrix of K-Means clustering results on dataset 6.

StudentMathScores

Dataset statistics after preprocessing.

MML Dataset

Top 10 records analysis results.

Top 100 records analysis results.

Dataset of books called The concise Oxford dictionary of mathematics