100+ datasets found
  1. P

    Mathematics Dataset Dataset

    • library.toponeai.link
    • paperswithcode.com
    Updated Nov 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://library.toponeai.link/dataset/mathematics
    Explore at:
    Dataset updated
    Nov 3, 2024
    Authors
    David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli
    Description

    This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

  2. math-pdfs

    • kaggle.com
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nithin Naidu (2024). math-pdfs [Dataset]. https://www.kaggle.com/datasets/nithinnaidu/math-pdfs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nithin Naidu
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Contents of the Dataset: Mathematical PDFs This dataset comprises 500+ mathematical PDF files meticulously curated to cover a wide range of mathematical topics. The primary focus is on key concepts, making it an excellent resource for students, educators, and researchers. The files have been processed and organized for optimal usability in adaptive learning systems and AI-powered educational tools.

    Key Features: Comprehensive Coverage of Topics:

    Algebra: Equations, variables, polynomials, and algebraic expressions. Calculus: Derivatives, integrals, limits, and differential equations. Geometry: Triangles, circles, angles, and other geometric properties. Trigonometry: Sine, cosine, tangent, and trigonometric identities. Statistics: Probability, distributions, mean, variance, and other statistical concepts. Enhanced Content Processing:

    Each document has been pre-processed to extract key concepts, topics, and subtopics. Enables content clustering and topic indexing for seamless topic retrieval. Use Cases:

    Adaptive Learning Systems: Personalized lesson generation and targeted exercises. AI-Powered Education Platforms: Semantic search and clustering for better topic recommendations. Content Analysis: Clustering and summarization for advanced data analysis. File Details:

    Formats: PDF Source: Internet Archive - Mathematics Collection Size: 500+ files totaling approximately X GB (adjust based on actual size). Processing Capabilities:

    The dataset has been structured to allow integration with AI models like Gemini for generating personalized explanations and tracking student progress. Designed for multi-age groups, providing flexibility in learning for students and educators. About the Source The dataset was sourced from the Internet Archive's Mathematics Collection, a reputable and open-access repository of educational content. All files comply with public access guidelines and are redistributed here for educational and non-commercial use.

    Licensing The dataset adheres to the applicable licensing guidelines of the source. It is shared under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, allowing others to remix, adapt, and build upon this content for non-commercial purposes.

  3. T

    math_dataset

    • tensorflow.org
    • huggingface.co
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). math_dataset [Dataset]. https://www.tensorflow.org/datasets/catalog/math_dataset
    Explore at:
    Dataset updated
    Jan 4, 2023
    Description

    Mathematics database.

    This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

    Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

    Example usage:

    train_examples, val_examples = tfds.load(
      'math_dataset/arithmetic_mul',
      split=['train', 'test'],
      as_supervised=True)
    

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('math_dataset', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. r

    Dataset for The effects of a number line intervention on calculation skills

    • researchdata.edu.au
    • figshare.mq.edu.au
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.V1
    Explore at:
    Dataset updated
    May 18, 2023
    Dataset provided by
    Macquarie University
    Authors
    Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas
    Description

    Study information

    The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.

    All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.

    The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.

    The number of measurement points were distributed across participants as follows:

    Participant 1 – 3 baseline, 6 treatment, 1 post-treatment

    Participant 3 – 2 baseline, 7 treatment, 1 post-treatment

    Participant 5 – 2 baseline, 5 treatment, 1 post-treatment

    Participant 6 – 3 baseline, 4 treatment, 1 post-treatment

    Participant 7 – 2 baseline, 5 treatment, 1 post-treatment

    In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.


    Measures

    Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.


    Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.


    Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.


    Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.


    The Number Line Intervention

    During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.


    Variables in the dataset

    Age = age in ‘years, months’ at the start of the study

    Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)

    Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).


    The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.


    The second part of the variable name refers to the task, as follows:

    DC = dot comparison

    SDC = single-digit computation

    NLE_UT = number line estimation (untrained set)

    NLE_T= number line estimation (trained set)

    CE = multidigit computational estimation

    NC = number comparison

    The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).


    Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.





  5. h

    HindiMathQuest

    • huggingface.co
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dnyanesh Walwadkar (2024). HindiMathQuest [Dataset]. http://doi.org/10.57967/hf/3259
    Explore at:
    Dataset updated
    Oct 25, 2024
    Authors
    Dnyanesh Walwadkar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview:

    The HindiMathQuest: A Dataset for Mathematical Reasoning and Problem-Solving in Hindi is designed to advance the capabilities of language models in understanding and solving mathematical problems presented in the Hindi language. The dataset covers a comprehensive range of question types, including logical reasoning, numeric calculations, translation-based problems, and complex mathematical tasks typically seen in competitive exams. This dataset is intended to fill a… See the full description on the dataset page: https://huggingface.co/datasets/dnyanesh/HindiMathQuest.

  6. h

    NuminaMath-CoT

    • huggingface.co
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Project-Numina (2024). NuminaMath-CoT [Dataset]. https://huggingface.co/datasets/AI-MO/NuminaMath-CoT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 19, 2024
    Dataset authored and provided by
    Project-Numina
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for NuminaMath CoT

      Dataset Summary
    

    Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner. The sources of the dataset range from Chinese high school math exercises to US and international mathematics olympiad competition problems. The data were primarily collected from online exam paper PDFs and mathematics discussion forums. The processing steps include (a) OCR from the original PDFs, (b) segmentation into… See the full description on the dataset page: https://huggingface.co/datasets/AI-MO/NuminaMath-CoT.

  7. p

    Trends in Math Proficiency (2010-2022): Grass Range 7-8 vs. Montana vs....

    • publicschoolreview.com
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Trends in Math Proficiency (2010-2022): Grass Range 7-8 vs. Montana vs. Grass Range Elementary School District [Dataset]. https://www.publicschoolreview.com/grass-range-7-8-profile
    Explore at:
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Grass Range Elementary School District, Grass Range
    Description

    This dataset tracks annual math proficiency from 2010 to 2022 for Grass Range 7-8 vs. Montana and Grass Range Elementary School District

  8. DETAILS OF THE PARAMETER RANGE CORRELATION PROCESS AND DATA SOURCE AND...

    • figshare.com
    zip
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muslim Husam (2023). DETAILS OF THE PARAMETER RANGE CORRELATION PROCESS AND DATA SOURCE AND ANALYSIS [Dataset]. http://doi.org/10.6084/m9.figshare.21915813.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Muslim Husam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Real driving datasets collected by instrumented vehicles from Japanese highways.

  9. h

    Advanced-Math

    • huggingface.co
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    haijian (2025). Advanced-Math [Dataset]. https://huggingface.co/datasets/haijian06/Advanced-Math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2025
    Authors
    haijian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Here's a concise README for your Advanced-Math dataset:

      Advanced-Math Dataset
    

    This Advanced-Math dataset is designed to support advanced studies and research in various mathematical fields. It encompasses a wide range of topics, including:

    Calculus Linear Algebra Probability Machine Learning Deep Learning

    The dataset primarily focuses on computational problems, which constitute over 80% of the content. Additionally, it includes related logical concept questions to provide a… See the full description on the dataset page: https://huggingface.co/datasets/haijian06/Advanced-Math.

  10. InftyMCCDB-2 dataset

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahshad Mahdavi; Mahshad Mahdavi (2020). InftyMCCDB-2 dataset [Dataset]. http://doi.org/10.5281/zenodo.3483048
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mahshad Mahdavi; Mahshad Mahdavi
    Description

    InftyMCCDB-2 dataset is a modified version of InftyCDB-2 which contains mathematical expressions from scanned article pages.

    The original dataset has 21,056 math expressions. We remove formulas with matrices and grids, leaving 19,381 formulas. The dataset includes 213 symbol classes, and is split into two sets: training (12551 images), and testing (6830 images) with approximately the same distribution of symbol classes and relation classes. The expressions range in size from a single symbol to more than 75 symbols, with an average of 7.33 symbols per expression.

    The original InftyCDB-2 provides ground truth at the symbol level. We extracted connected component bounding boxes, and generated new ground truth for each image using a labeled adjacency matrix (`label graph') representation.

    The set of .lg (label graph) ground truth files are provided, along with a .png image for each expression.

  11. Z

    Data from: Grouping strategies in number estimation extend the subitizing...

    • data.niaid.nih.gov
    Updated Nov 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arrighi Roberto (2020). Grouping strategies in number estimation extend the subitizing range [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4292116
    Explore at:
    Dataset updated
    Nov 30, 2020
    Dataset provided by
    Burr David C.
    Maldonado Moscoso Paula Andrea
    Anobile Giovanni
    Arrighi Roberto
    Castaldi Elisa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the calculation folder: each file contains a matrix called “MATR”. Each row of the matrix “MATR” is a trial.

    The columns contain the following information:

    1st: Number of trial

    2nd: Subject response

    4th: Response time

    5th: first number

    6th: math symbol (1=*; 2= +; 3= –)

    7th: second number

    8th: third number

    In the calculation folder: each file contains a matrix called “matr”. Each row of the matrix “matr” is a trial.

    The columns contain the following information:

    1st: subject response in the numerosity task

    2nd: the presented numerosity

    3rd: subject response in the numerosity task

    4th: zero

    5th: stimulus duration

    6th: Response time in the numerosity task

    7th: Grouped (1) or random (2) presentation

    8th: 1

    9th: 1

    10th: Number of items of the upper-left quadrant

    11th: Number of items of the lower-left quadrant

    12th: Number of items of the upper-right quadrant

    13th: Number of items of the lower - right quadrant

    14th: odd shape presented (1=diamond; 2=triangle; 3=circle)

    15th: subject response in the shape task

    16th: 0.2 in the single task response time in the shape task when dual task

    17th: single (0) or dual (1) task

    18th: time stimulus on

    19th: time stimulus off

  12. Prime Number Source Code with Dataset

    • figshare.com
    zip
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ayman Mostafa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.

  13. h

    AutoMathText

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    math-ai, AutoMathText [Dataset]. https://huggingface.co/datasets/math-ai/AutoMathText
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    math-ai
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🎉 This work, introducing the AutoMathText dataset and the AutoDS method, has been accepted to The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)! 🎉

      AutoMathText
    

    AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts. It's a compilation sourced from a diverse range of platforms including various websites, arXiv, and GitHub (OpenWebMath, RedPajama, Algebraic Stack). This rich repository… See the full description on the dataset page: https://huggingface.co/datasets/math-ai/AutoMathText.

  14. N

    Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1e392ff-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Grass Range
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Grass Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Grass Range. The dataset can be utilized to understand the population distribution of Grass Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Grass Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Grass Range.

    Key observations

    Largest age group (population): Male # 35-39 years (7) | Female # 70-74 years (36). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Grass Range is shown in the following column.
    • Population (Female): The female population in the Grass Range is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Grass Range for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Grass Range Population by Gender. You can refer the same here

  15. F

    Malayalam Chain of Thought Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Malayalam Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/malayalam-chain-of-thought-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Malayalam Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.

    Dataset Content:

    This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Malayalam language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.

    Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Malayalam people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.

    Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.

    Prompt Diversity:

    To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.

    These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.

    Response Formats:

    To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.

    These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Malayalam Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.

    Quality and Accuracy:

    Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The Malayalam version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Malayalam Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  16. h

    ARPO-MATH

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhebo Wang (2025). ARPO-MATH [Dataset]. https://huggingface.co/datasets/BreynaldDva/ARPO-MATH
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Zhebo Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TRAIN

    Level middle (sample size: 467):

    Average length: 61.47 ± 20.95
    Range: 25 - 171
    Percentiles:
    25th = 45.0
    50th = 59.0
    75th = 73.5

    Level hard (sample size: 462):

    Average length: 69.82 ± 70.86
    Range: 8 - 700
    Percentiles:
    25th = 33.0
    50th = 50.0
    75th = 78.0

    Level easy (sample size: 471):

    Average length: 36.38 ± 9.58
    Range: 13 - 72
    Percentiles:
    25th = 29.0
    50th = 36.0
    75th = 42.0

    TEST
    

    Level easy (sample size: 29):… See the full description on the dataset page: https://huggingface.co/datasets/BreynaldDva/ARPO-MATH.

  17. d

    Data from: Overcoming the challenge of small effective sample sizes in...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Sep 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese (2019). Overcoming the challenge of small effective sample sizes in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.16bc7f2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2019
    Dataset provided by
    Dryad
    Authors
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese
    Time period covered
    2019
    Area covered
    Brazil, Pantanal
    Description

    GPS tracking data on lowland tapirtapir.zip

  18. p

    Trends in Math Proficiency (2011-2022): East Range Ii Csd School vs. Maine...

    • publicschoolreview.com
    Updated Nov 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2022). Trends in Math Proficiency (2011-2022): East Range Ii Csd School vs. Maine vs. East Range Csd School District [Dataset]. https://www.publicschoolreview.com/east-range-ii-csd-school-profile
    Explore at:
    Dataset updated
    Nov 13, 2022
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Maine
    Description

    This dataset tracks annual math proficiency from 2011 to 2022 for East Range Ii Csd School vs. Maine and East Range Csd School District

  19. h

    GSM-Plus

    • huggingface.co
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qintong Li (2024). GSM-Plus [Dataset]. https://huggingface.co/datasets/qintongli/GSM-Plus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Authors
    Qintong Li
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Description

    GSM-Plus aims to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations. GSM-Plus is an adversarial grade school math dataset, an extension of GSM8K augmented with various mathematical perturbations. Motivated by the capability taxonomy for solving math problems mentioned in Polya’s principles, we identify 5 perspectives to guide the development of GSM-PLUS:

    numerical variation refers to altering the numerical… See the full description on the dataset page: https://huggingface.co/datasets/qintongli/GSM-Plus.

  20. N

    Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grass-range-mt-population-by-age/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Grass Range
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the Grass Range, MT population pyramid, which represents the Grass Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Grass Range, MT, is 17.1.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Grass Range, MT, is 160.0.
    • Total dependency ratio for Grass Range, MT is 177.1.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for Grass Range, MT is 0.6.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Grass Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the Grass Range for the selected age group is shown in the following column.
    • Total Population: The total population of the Grass Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Grass Range Population by Age. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://library.toponeai.link/dataset/mathematics

Mathematics Dataset Dataset

Explore at:
349 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 3, 2024
Authors
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli
Description

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Search
Clear search
Close search
Google apps
Main menu