100+ datasets found

P
Mathematics Dataset Dataset
library.toponeai.link
paperswithcode.com
Updated Nov 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://library.toponeai.link/dataset/mathematics
Explore at:
Dataset updated
Nov 3, 2024
Authors
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli
Description
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
math-pdfs
kaggle.com
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nithin Naidu (2024). math-pdfs [Dataset]. https://www.kaggle.com/datasets/nithinnaidu/math-pdfs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nithin Naidu
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Contents of the Dataset: Mathematical PDFs This dataset comprises 500+ mathematical PDF files meticulously curated to cover a wide range of mathematical topics. The primary focus is on key concepts, making it an excellent resource for students, educators, and researchers. The files have been processed and organized for optimal usability in adaptive learning systems and AI-powered educational tools.

Key Features: Comprehensive Coverage of Topics:

Algebra: Equations, variables, polynomials, and algebraic expressions. Calculus: Derivatives, integrals, limits, and differential equations. Geometry: Triangles, circles, angles, and other geometric properties. Trigonometry: Sine, cosine, tangent, and trigonometric identities. Statistics: Probability, distributions, mean, variance, and other statistical concepts. Enhanced Content Processing:

Each document has been pre-processed to extract key concepts, topics, and subtopics. Enables content clustering and topic indexing for seamless topic retrieval. Use Cases:

Adaptive Learning Systems: Personalized lesson generation and targeted exercises. AI-Powered Education Platforms: Semantic search and clustering for better topic recommendations. Content Analysis: Clustering and summarization for advanced data analysis. File Details:

Formats: PDF Source: Internet Archive - Mathematics Collection Size: 500+ files totaling approximately X GB (adjust based on actual size). Processing Capabilities:

The dataset has been structured to allow integration with AI models like Gemini for generating personalized explanations and tracking student progress. Designed for multi-age groups, providing flexibility in learning for students and educators. About the Source The dataset was sourced from the Internet Archive's Mathematics Collection, a reputable and open-access repository of educational content. All files comply with public access guidelines and are redistributed here for educational and non-commercial use.

Licensing The dataset adheres to the applicable licensing guidelines of the source. It is shared under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, allowing others to remix, adapt, and build upon this content for non-commercial purposes.
T
math_dataset
tensorflow.org
huggingface.co
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). math_dataset [Dataset]. https://www.tensorflow.org/datasets/catalog/math_dataset
Explore at:
Dataset updated
Jan 4, 2023
Description
Mathematics database.

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

Example usage:

train_examples, val_examples = tfds.load( 'math_dataset/arithmetic_mul', split=['train', 'test'], as_supervised=True)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('math_dataset', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
r
Dataset for The effects of a number line intervention on calculation skills
researchdata.edu.au
figshare.mq.edu.au
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.V1
Explore at:
Unique identifier
https://doi.org/10.25949/22799717.V1
Dataset updated
May 18, 2023
Dataset provided by
Macquarie University
Authors
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas
Description

Study information

The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.

All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.

The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.

The number of measurement points were distributed across participants as follows:

Participant 1 – 3 baseline, 6 treatment, 1 post-treatment

Participant 3 – 2 baseline, 7 treatment, 1 post-treatment

Participant 5 – 2 baseline, 5 treatment, 1 post-treatment

Participant 6 – 3 baseline, 4 treatment, 1 post-treatment

Participant 7 – 2 baseline, 5 treatment, 1 post-treatment

In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

Measures

Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

The Number Line Intervention

During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

Variables in the dataset

Age = age in ‘years, months’ at the start of the study

Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)

Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

The second part of the variable name refers to the task, as follows:

DC = dot comparison

SDC = single-digit computation

NLE_UT = number line estimation (untrained set)

NLE_T= number line estimation (trained set)

CE = multidigit computational estimation

NC = number comparison

The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
h
HindiMathQuest
huggingface.co
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dnyanesh Walwadkar (2024). HindiMathQuest [Dataset]. http://doi.org/10.57967/hf/3259
Explore at:
Unique identifier
https://doi.org/10.57967/hf/3259
Dataset updated
Oct 25, 2024
Authors
Dnyanesh Walwadkar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview:

The HindiMathQuest: A Dataset for Mathematical Reasoning and Problem-Solving in Hindi is designed to advance the capabilities of language models in understanding and solving mathematical problems presented in the Hindi language. The dataset covers a comprehensive range of question types, including logical reasoning, numeric calculations, translation-based problems, and complex mathematical tasks typically seen in competitive exams. This dataset is intended to fill a… See the full description on the dataset page: https://huggingface.co/datasets/dnyanesh/HindiMathQuest.
h
NuminaMath-CoT
huggingface.co
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Project-Numina (2024). NuminaMath-CoT [Dataset]. https://huggingface.co/datasets/AI-MO/NuminaMath-CoT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2024
Dataset authored and provided by
Project-Numina
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for NuminaMath CoT

Dataset Summary

Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner. The sources of the dataset range from Chinese high school math exercises to US and international mathematics olympiad competition problems. The data were primarily collected from online exam paper PDFs and mathematics discussion forums. The processing steps include (a) OCR from the original PDFs, (b) segmentation into… See the full description on the dataset page: https://huggingface.co/datasets/AI-MO/NuminaMath-CoT.
p
Trends in Math Proficiency (2010-2022): Grass Range 7-8 vs. Montana vs....
publicschoolreview.com
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2025). Trends in Math Proficiency (2010-2022): Grass Range 7-8 vs. Montana vs. Grass Range Elementary School District [Dataset]. https://www.publicschoolreview.com/grass-range-7-8-profile
Explore at:
Dataset updated
Jun 3, 2025
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Grass Range Elementary School District, Grass Range
Description
This dataset tracks annual math proficiency from 2010 to 2022 for Grass Range 7-8 vs. Montana and Grass Range Elementary School District
DETAILS OF THE PARAMETER RANGE CORRELATION PROCESS AND DATA SOURCE AND...
figshare.com
zip
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muslim Husam (2023). DETAILS OF THE PARAMETER RANGE CORRELATION PROCESS AND DATA SOURCE AND ANALYSIS [Dataset]. http://doi.org/10.6084/m9.figshare.21915813.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21915813.v1
Dataset updated
Mar 15, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Muslim Husam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Real driving datasets collected by instrumented vehicles from Japanese highways.
h
Advanced-Math
huggingface.co
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
haijian (2025). Advanced-Math [Dataset]. https://huggingface.co/datasets/haijian06/Advanced-Math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2025
Authors
haijian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Here's a concise README for your Advanced-Math dataset:

Advanced-Math Dataset

This Advanced-Math dataset is designed to support advanced studies and research in various mathematical fields. It encompasses a wide range of topics, including:

Calculus Linear Algebra Probability Machine Learning Deep Learning

The dataset primarily focuses on computational problems, which constitute over 80% of the content. Additionally, it includes related logical concept questions to provide a… See the full description on the dataset page: https://huggingface.co/datasets/haijian06/Advanced-Math.
InftyMCCDB-2 dataset
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahshad Mahdavi; Mahshad Mahdavi (2020). InftyMCCDB-2 dataset [Dataset]. http://doi.org/10.5281/zenodo.3483048
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3483048
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mahshad Mahdavi; Mahshad Mahdavi
Description
InftyMCCDB-2 dataset is a modified version of InftyCDB-2 which contains mathematical expressions from scanned article pages.

The original dataset has 21,056 math expressions. We remove formulas with matrices and grids, leaving 19,381 formulas. The dataset includes 213 symbol classes, and is split into two sets: training (12551 images), and testing (6830 images) with approximately the same distribution of symbol classes and relation classes. The expressions range in size from a single symbol to more than 75 symbols, with an average of 7.33 symbols per expression.

The original InftyCDB-2 provides ground truth at the symbol level. We extracted connected component bounding boxes, and generated new ground truth for each image using a labeled adjacency matrix (`label graph') representation.

The set of .lg (label graph) ground truth files are provided, along with a .png image for each expression.
Z
Data from: Grouping strategies in number estimation extend the subitizing...
data.niaid.nih.gov
Updated Nov 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arrighi Roberto (2020). Grouping strategies in number estimation extend the subitizing range [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4292116
Explore at:
Dataset updated
Nov 30, 2020
Dataset provided by
Burr David C.
Maldonado Moscoso Paula Andrea
Anobile Giovanni
Arrighi Roberto
Castaldi Elisa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the calculation folder: each file contains a matrix called “MATR”. Each row of the matrix “MATR” is a trial.

The columns contain the following information:

1st: Number of trial

2nd: Subject response

4th: Response time

5th: first number

6th: math symbol (1=*; 2= +; 3= –)

7th: second number

8th: third number

In the calculation folder: each file contains a matrix called “matr”. Each row of the matrix “matr” is a trial.

The columns contain the following information:

1st: subject response in the numerosity task

2nd: the presented numerosity

3rd: subject response in the numerosity task

4th: zero

5th: stimulus duration

6th: Response time in the numerosity task

7th: Grouped (1) or random (2) presentation

8th: 1

9th: 1

10th: Number of items of the upper-left quadrant

11th: Number of items of the lower-left quadrant

12th: Number of items of the upper-right quadrant

13th: Number of items of the lower - right quadrant

14th: odd shape presented (1=diamond; 2=triangle; 3=circle)

15th: subject response in the shape task

16th: 0.2 in the single task response time in the shape task when dual task

17th: single (0) or dual (1) task

18th: time stimulus on

19th: time stimulus off
Prime Number Source Code with Dataset
figshare.com
zip
Updated Oct 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27215508.v1
Dataset updated
Oct 12, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ayman Mostafa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.
h
AutoMathText
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
math-ai, AutoMathText [Dataset]. https://huggingface.co/datasets/math-ai/AutoMathText
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
math-ai
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🎉 This work, introducing the AutoMathText dataset and the AutoDS method, has been accepted to The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)! 🎉

AutoMathText

AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts. It's a compilation sourced from a diverse range of platforms including various websites, arXiv, and GitHub (OpenWebMath, RedPajama, Algebraic Stack). This rich repository… See the full description on the dataset page: https://huggingface.co/datasets/math-ai/AutoMathText.
N
Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1e392ff-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Grass Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Grass Range. The dataset can be utilized to understand the population distribution of Grass Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Grass Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Grass Range.

Key observations

Largest age group (population): Male # 35-39 years (7) | Female # 70-74 years (36). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Grass Range is shown in the following column.

Population (Female): The female population in the Grass Range is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Grass Range for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range Population by Gender. You can refer the same here
F
Malayalam Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Malayalam Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/malayalam-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Welcome to the Malayalam Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.

Dataset Content:
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Malayalam language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.

Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Malayalam people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.

Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.

Prompt Diversity:
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.

These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.

Response Formats:
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.

These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

Data Format and Annotation Details:
This fully labeled Malayalam Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.

Quality and Accuracy:
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

The Malayalam version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.

License:
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Malayalam Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
h
ARPO-MATH
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhebo Wang (2025). ARPO-MATH [Dataset]. https://huggingface.co/datasets/BreynaldDva/ARPO-MATH
Explore at:
Dataset updated
May 29, 2025
Authors
Zhebo Wang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TRAIN

Level middle (sample size: 467):

Average length: 61.47 ± 20.95
Range: 25 - 171
Percentiles:
25th = 45.0
50th = 59.0
75th = 73.5

Level hard (sample size: 462):

Average length: 69.82 ± 70.86
Range: 8 - 700
Percentiles:
25th = 33.0
50th = 50.0
75th = 78.0

Level easy (sample size: 471):

Average length: 36.38 ± 9.58
Range: 13 - 72
Percentiles:
25th = 29.0
50th = 36.0
75th = 42.0

TEST

Level easy (sample size: 29):… See the full description on the dataset page: https://huggingface.co/datasets/BreynaldDva/ARPO-MATH.
d
Data from: Overcoming the challenge of small effective sample sizes in...
datadryad.org
data.niaid.nih.gov
zip
Updated Sep 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese (2019). Overcoming the challenge of small effective sample sizes in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.16bc7f2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.16bc7f2
Dataset updated
Sep 8, 2019
Dataset provided by
Dryad
Authors
Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese
Time period covered
2019
Area covered
Brazil, Pantanal
Description
GPS tracking data on lowland tapirtapir.zip
p
Trends in Math Proficiency (2011-2022): East Range Ii Csd School vs. Maine...
publicschoolreview.com
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2022). Trends in Math Proficiency (2011-2022): East Range Ii Csd School vs. Maine vs. East Range Csd School District [Dataset]. https://www.publicschoolreview.com/east-range-ii-csd-school-profile
Explore at:
Dataset updated
Nov 13, 2022
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Maine
Description
This dataset tracks annual math proficiency from 2011 to 2022 for East Range Ii Csd School vs. Maine and East Range Csd School District
h
GSM-Plus
huggingface.co
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qintong Li (2024). GSM-Plus [Dataset]. https://huggingface.co/datasets/qintongli/GSM-Plus
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Authors
Qintong Li
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Description

GSM-Plus aims to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations. GSM-Plus is an adversarial grade school math dataset, an extension of GSM8K augmented with various mathematical perturbations. Motivated by the capability taxonomy for solving math problems mentioned in Polya’s principles, we identify 5 perspectives to guide the development of GSM-PLUS:

numerical variation refers to altering the numerical… See the full description on the dataset page: https://huggingface.co/datasets/qintongli/GSM-Plus.
N
Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grass-range-mt-population-by-age/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the data for the Grass Range, MT population pyramid, which represents the Grass Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

Key observations

Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Grass Range, MT, is 17.1.

Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Grass Range, MT, is 160.0.

Total dependency ratio for Grass Range, MT is 177.1.

Potential support ratio, which is the number of youth (working age population) per elderly, for Grass Range, MT is 0.6.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Grass Range for the selected age group is shown in the following column.

Population (Female): The female population in the Grass Range for the selected age group is shown in the following column.

Total Population: The total population of the Grass Range for the selected age group is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range Population by Age. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://library.toponeai.link/dataset/mathematics

Mathematics Dataset Dataset

Explore at:

349 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 3, 2024

Authors

David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli

Description

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Clear search

Close search

Google apps

Main menu

Mathematics Dataset Dataset

math-pdfs

math_dataset

Dataset for The effects of a number line intervention on calculation skills

Study information

Measures

The Number Line Intervention

Variables in the dataset

HindiMathQuest

NuminaMath-CoT

Trends in Math Proficiency (2010-2022): Grass Range 7-8 vs. Montana vs....

DETAILS OF THE PARAMETER RANGE CORRELATION PROCESS AND DATA SOURCE AND...

Advanced-Math

InftyMCCDB-2 dataset

Data from: Grouping strategies in number estimation extend the subitizing...

Prime Number Source Code with Dataset

AutoMathText

Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and...

About this dataset

Content

Inspiration

Recommended for further research

Malayalam Chain of Thought Prompt & Response Dataset

What’s Included

ARPO-MATH

Data from: Overcoming the challenge of small effective sample sizes in...

Trends in Math Proficiency (2011-2022): East Range Ii Csd School vs. Maine...

GSM-Plus

Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Mathematics Dataset DatasetSee More Versions

Mathematics Dataset Dataset