64 datasets found
  1. Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...

    • scielo.figshare.com
    • datasetcatalog.nlm.nih.gov
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

  2. Mathematical Problems Dataset: Various

    • kaggle.com
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Mathematical Problems Dataset: Various [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathematical-problems-dataset-various-mathematic/code
    Explore at:
    zip(2498203187 bytes)Available download formats
    Dataset updated
    Dec 2, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mathematical Problems Dataset: Various Mathematical Problems and Solutions

    Mathematical Problems Dataset: Questions and Answers

    By math_dataset (From Huggingface) [source]

    About this dataset

    This dataset comprises a collection of mathematical problems and their solutions designed for training and testing purposes. Each problem is presented in the form of a question, followed by its corresponding answer. The dataset covers various mathematical topics such as arithmetic, polynomials, and prime numbers. For instance, the arithmetic_nearest_integer_root_test.csv file focuses on problems involving finding the nearest integer root of a given number. Similarly, the polynomials_simplify_power_test.csv file deals with problems related to simplifying polynomials with powers. Additionally, the dataset includes the numbers_is_prime_train.csv file containing math problems that require determining whether a specific number is prime or not. The questions and answers are provided in text format to facilitate analysis and experimentation with mathematical problem-solving algorithms or models

    How to use the dataset

    • Introduction: The Mathematical Problems Dataset contains a collection of various mathematical problems and their corresponding solutions or answers. This guide will provide you with all the necessary information on how to utilize this dataset effectively.

    • Understanding the columns: The dataset consists of several columns, each representing a different aspect of the mathematical problem and its solution. The key columns are:

      • question: This column contains the text representation of the mathematical problem or equation.
      • answer: This column contains the text representation of the solution or answer to the corresponding problem.
    • Exploring specific problem categories: To focus on specific types of mathematical problems, you can filter or search within the dataset using relevant keywords or terms related to your area of interest. For example, if you are interested in prime numbers, you can search for prime in the question column.

    • Applying machine learning techniques: This dataset can be used for training machine learning models related to natural language understanding and mathematics. You can explore various techniques such as text classification, sentiment analysis, or even sequence-to-sequence models for solving mathematical problems based on their textual representations.

    • Generating new questions and solutions: By analyzing patterns in this dataset, you can generate new questions and solutions programmatically using techniques like data augmentation or rule-based methods.

    • Validation and evaluation: As with any other machine learning task, it is essential to validate your models on separate validation sets not included in this dataset properly. You can also evaluate model performance by comparing predictions against known answers provided in this dataset's answer column.

    • Sharing insights and findings: After working with this datasets, it would be beneficial for researchers or educators to share their insights, approaches taken during analysis/modelling as Kaggle notebooks/ discussions/ blogs/ tutorials etc., so that others could get benefited from such shared resources too.

    Note: Please note that the dataset does not include dates.

    By following these guidelines, you can effectively explore and utilize the Mathematical Problems Dataset for various mathematical problem-solving tasks. Happy exploring!

    Research Ideas

    • Developing machine learning algorithms for solving mathematical problems: This dataset can be used to train and test models that can accurately predict the solution or answer to different mathematical problems.
    • Creating educational resources: The dataset can be used to create a wide variety of educational materials such as problem sets, worksheets, and quizzes for students studying mathematics.
    • Research in mathematical problem-solving strategies: Researchers and educators can analyze the dataset to identify common patterns or strategies employed in solving different types of mathematical problems. This analysis can help improve teaching methodologies and develop effective problem-solving techniques

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purpos...

  3. collatz-conjecture-10k

    • kaggle.com
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taekyung (2025). collatz-conjecture-10k [Dataset]. https://www.kaggle.com/datasets/taeryy/collatz-conjecture-10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    Kaggle
    Authors
    Taekyung
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🔢 The collatz conjecture

    • n is even number, Divide by 2.
    • n is odd number, multiply by 3 and add 1.
    • repeat, See if it goes to 1.
    • Does it apply to all natural numbers?

    This concise problem has not yet been proven. This problem is also famous for its beautiful graphs.

    📂 Dataset Structure

    File: col_10k.txt Each line contains one Collatz sequence (comma-separated integers).
    Example row: 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 Number of sequences: 10,000

    💻 Starter Notebook

    https://www.kaggle.com/code/taeryy/use-collatz-sequence-data

    📜 License

    CC-BY 4.0 Feel free to use this dataset for any purpose, just credit the source: Created by Taery (Kaggle: taeryy)

  4. Named Math Formulas

    • kaggle.com
    • huggingface.co
    zip
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marília Prata (2023). Named Math Formulas [Dataset]. https://www.kaggle.com/datasets/mpwolke/cusersmarildownloadsdata-json/code
    Explore at:
    zip(19910 bytes)Available download formats
    Dataset updated
    Dec 30, 2023
    Authors
    Marília Prata
    Description

    "Mathematical dataset based on 71 famous mathematical identities. Each entry consists of a name of the identity (name), a representation of that identity (formula), a label whether the representation belongs to the identity (label), and an id of the mathematical identity (formula_name_id). The false pairs are intentionally challenging, e.g., a^2+2^b=c^2as falsified version of the Pythagorean Theorem. All entries have been generated by using data.json as starting point and applying the randomizing and falsifying algorithms here. The formulas in the dataset are not just pure mathematical, but contain also textual descriptions of the mathematical identity. At most 400000 versions are generated per identity. There are ten times more falsified versions than true ones, such that the dataset can be used for a training with changing false examples every epoch."

    https://huggingface.co/datasets/ddrg/named_math_formulas

  5. h

    MathVista

    • huggingface.co
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI for Math Reasoning (2023). MathVista [Dataset]. https://huggingface.co/datasets/AI4Math/MathVista
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset authored and provided by
    AI for Math Reasoning
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for MathVista

    Dataset Description Paper Information Dataset Examples Leaderboard Dataset Usage Data Downloading Data Format Data Visualization Data Source Automatic Evaluation

    License Citation

      Dataset Description
    

    MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical… See the full description on the dataset page: https://huggingface.co/datasets/AI4Math/MathVista.

  6. Math Formula Retrieval

    • kaggle.com
    • huggingface.co
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Math Formula Retrieval [Dataset]. https://www.kaggle.com/datasets/thedevastator/math-formula-pair-classification-dataset/data
    Explore at:
    zip(2021716728 bytes)Available download formats
    Dataset updated
    Dec 2, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Math Formula Retrieval

    Math Formula Pair Classification Dataset

    By ddrg (From Huggingface) [source]

    About this dataset

    With a total of six columns, including formula1, formula2, label (binary format), formula1, formula2, and label, the dataset provides all the necessary information for conducting comprehensive analysis and evaluation.

    The train.csv file contains a subset of the dataset specifically curated for training purposes. It includes an extensive range of math formula pairs along with their corresponding labels and unique ID names. This allows researchers and data scientists to construct models that can predict whether two given formulas fall within the same category or not.

    On the other hand, test.csv serves as an evaluation set. It consists of additional pairs of math formulas accompanied by their respective labels and unique IDs. By evaluating model performance on this test set after training it on train.csv data, researchers can assess how well their models generalize to unseen instances.

    By leveraging this informative dataset, researchers can unlock new possibilities in mathematics-related fields such as pattern recognition algorithms development or enhancing educational tools that involve automatic identification and categorization tasks based on mathematical formulas

    How to use the dataset

    Introduction

    Dataset Description

    train.csv

    The train.csv file contains a set of labeled math formula pairs along with their corresponding labels and formula name IDs. It consists of the following columns: - formula1: The first mathematical formula in the pair (text). - formula2: The second mathematical formula in the pair (text). - label: The classification label indicating whether the pair of formulas belong to the same category or not (binary). A label value of 1 indicates that both formulas belong to the same category, while a label value of 0 indicates different categories.

    test.csv

    The purpose of the test.csv file is to provide a set of formula pairs along with their labels and formula name IDs for testing and evaluation purposes. It has an identical structure to train.csv, containing columns like formula1, formula2, label, etc.

    Task

    The main task using this dataset is binary classification, where your objective is to predict whether two mathematical formulas belong to the same category or not based on their textual representation. You can use various machine learning algorithms such as logistic regression, decision trees, random forests, or neural networks for training models on this dataset.

    Exploring & Analyzing Data

    Before building your model, it's crucial to explore and analyze your data. Here are some steps you can take:

    • Load both CSV files (train.csv and test.csv) into your preferred data analysis framework or programming language (e.g., Python with libraries like pandas).
    • Examine the dataset's structure, including the number of rows, columns, and data types.
    • Check for missing values in the dataset and handle them accordingly.
    • Visualize the distribution of labels to understand whether it is balanced or imbalanced.

    Model Building

    Once you have analyzed and preprocessed your dataset, you can start building your classification model using various machine learning algorithms:

    • Split your train.csv data into training and validation sets for model evaluation during training.
    • Choose a suitable

    Research Ideas

    • Math Formula Similarity: This dataset can be used to develop a model that classifies whether two mathematical formulas are similar or not. This can be useful in various applications such as plagiarism detection, identifying duplicate formulas in databases, or suggesting similar formulas based on user input.
    • Formula Categorization: The dataset can be used to train a model that categorizes mathematical formulas into different classes or categories. For example, the model can classify formulas into algebraic expressions, trigonometric equations, calculus problems, or geometric theorems. This categorization can help organize and search through large collections of mathematical formulas.
    • Formula Recommendation: Using this dataset, one could build a recommendation system that suggests related math formulas based on user input. By analyzing the similarities between different formula pairs and their corresponding labels, the system could provide recommendations for relevant mathematical concepts that users may need while solving problems or studying specific topics in mathematics

    Acknowle...

  7. GSM8K - Grade School Math 8K Q&A

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
    Explore at:
    zip(3418660 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GSM8K - Grade School Math 8K Q&A

    A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

    By Huggingface Hub [source]

    About this dataset

    This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

    The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

    To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

    Research Ideas

    • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
    • Generating new grade school math questions and answers using g...
  8. Z

    Data from: Algorithm and System Co-design for Efficient Subgraph-based Graph...

    • data.niaid.nih.gov
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan (2025). Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_15186012
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Cornell University
    Peking University
    Purdue University West Lafayette
    Authors
    Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Following the format of the Open Graph Benchmark (OGB), we design four prediction tasks of relations (mag-write, mag-cite) and higher-order patterns (tags-math, DBLP-coauthor) and construct the corresponding datasets over heterogeneous graphs and hypergraphs [1]. The original ogb-mag dataset only contains features for 'paper'-type nodes. We add the node embedding provided by [2] as raw features for other node types in MAG(P-A)/(P-P). For these four tasks, the model is evaluated by one positive query paired with a certain number of randomly sampled negative queries (1:1000 by default, except for tags-math 1:100).

  9. Model comparison.

    • plos.figshare.com
    xls
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Park; George Sugihara; Gerald Pao (2024). Model comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0305408.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joseph Park; George Sugihara; Gerald Pao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Effective control requires knowledge of the process dynamics to guide the system toward desired states. In many control applications this knowledge is expressed mathematically or through data–driven models, however, as complexity grows obtaining a satisfactory mathematical representation is increasingly difficult. Further, many data–driven approaches consist of abstract internal representations that may have no obvious connection to the underlying dynamics and control, or, require extensive model design and training. Here, we remove these constraints by demonstrating model predictive control from generalized state space embedding of the process dynamics providing a data–driven, explainable method for control of nonlinear, complex systems. Generalized embedding and model predictive control are demonstrated on nonlinear dynamics generated by an agent based model of 1200 interacting agents. The method is generally applicable to any type of controller and dynamic system representable in a state space.

  10. f

    Data from: A brief introduction to nomography: graphical representation of...

    • tandf.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leslie Glasser; Ron Doerfler (2023). A brief introduction to nomography: graphical representation of mathematical relationships [Dataset]. http://doi.org/10.6084/m9.figshare.7139414.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Leslie Glasser; Ron Doerfler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nomographs (or nomograms, or alignment charts) are graphical representations of mathematical relationships (extending to empirical relationships of data) which are used by simply applying a straightedge across the plot through points on scales representing independent variables, which then crosses the corresponding datum point for the dependent variable; the choice among independent and dependent variable is arbitrary so that each variable may be determined in terms of the others. Examples of nomographs in common current use compute the lift available for a hot-air balloon, the boiling points of solvents under reduced pressure in the chemistry laboratory, and the relative forces in a centrifuge in a biochemical laboratory. Sundials represent another ancient yet widely familiar example. With the advent and ready accessibility of the computer, printed mathematical tables, slide rules and nomographs became generally redundant. However, nomographs provide insight into mathematical relationships, are useful for rapid and repeated application, even in the absence of calculational facilities, and can reliably be used in the field. Many nomographs for various purposes may be found online. This paper describes the origins and development of nomographs, illustrating their use with some relevant examples. A supplementary interactive Excel file demonstrates their application for some simple mathematical operations.

  11. w

    Global Math Calculation Software Market Research Report: By Application...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Math Calculation Software Market Research Report: By Application (Education, Engineering, Finance, Data Analysis), By Deployment Type (Cloud-Based, On-Premises), By End User (Students, Professionals, Educational Institutions, Businesses), By Features (Graphing Capabilities, Statistical Analysis, Equation Solving, Simulation) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/math-calculation-software-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20242113.7(USD Million)
    MARKET SIZE 20252263.7(USD Million)
    MARKET SIZE 20354500.0(USD Million)
    SEGMENTS COVEREDApplication, Deployment Type, End User, Features, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreasing demand for automation, rise in online education, growth in data analysis tools, adoption of cloud-based solutions, emphasis on STEM education
    MARKET FORECAST UNITSUSD Million
    KEY COMPANIES PROFILEDIBM, Oracle, Maplesoft, Algebraix, MathWorks, Tableau, SAP, PTC, Microsoft, Wolfram Research, ESRI, SAS Institute
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESAI integration in math tools, Mobile-friendly calculation apps, Enhanced data visualization features, Cloud-based collaboration solutions, Gamification of math learning
    COMPOUND ANNUAL GROWTH RATE (CAGR) 7.1% (2025 - 2035)
  12. H

    Data from: Zero Data Exposure: A New Framework for Enabling Generative AI on...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Oct 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanuj Boruah (2025). Zero Data Exposure: A New Framework for Enabling Generative AI on Private Enterprise Data [Dataset]. http://doi.org/10.7910/DVN/FZMD31
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Priyanuj Boruah
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This research addresses the central security challenge preventing the adoption of Generative AI in the enterprise: the unacceptable risk of exposing sensitive data. It introduces a formal framework for evaluating privacy-preserving AI strategies along three critical axes: (1) Security Guarantee Level (SGL), a measure of theoretical data privacy; (2) Contextual Fidelity, the analytical utility of the private data representation; and (3) Performance Overhead, the computational cost. This framework serves as a definitive guide for CTOs, CISOs, and data leaders to make evidence-based decisions, moving beyond simplistic assessments to a holistic, security-first evaluation of any proposed AI solution. Applying this framework to four distinct classes of methods, from simple statistical summaries to complex generative models, our analysis conclusively demonstrates that all common public approaches are fundamentally flawed for high-stakes enterprise use. It proves that methods offering the highest security guarantees (SGL-1) are analytically weak, while the only method capable of high fidelity (Generative Models) is architecturally insecure (SGL-2) and catastrophically slow. This paper formally identifies this critical "research gap" and establishes a clear, rigorous benchmark for the new class of solution required to unlock the full potential of AI on private data. This dataset contains the full PDF of the paper and a machine-readable CSV file of the analytical results.

  13. Marking scheme for representation test and mathematical problem solving.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Putri Yuanita; Hutkemri Zulnaidi; Effandi Zakaria (2023). Marking scheme for representation test and mathematical problem solving. [Dataset]. http://doi.org/10.1371/journal.pone.0204847.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Putri Yuanita; Hutkemri Zulnaidi; Effandi Zakaria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Marking scheme for representation test and mathematical problem solving.

  14. w

    Dataset of books called Ontic : a knowledge representation system for...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Ontic : a knowledge representation system for mathematics [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Ontic+%3A+a+knowledge+representation+system+for+mathematics
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Ontic : a knowledge representation system for mathematics. It features 7 columns including author, publication date, language, and book publisher.

  15. o

    Supplementary materials for the PhD thesis "The effect on learners...

    • ordo.open.ac.uk
    qt
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan P. San Diego (2025). Supplementary materials for the PhD thesis "The effect on learners strategies of varying computer-based representations: evidence from gazes, actions, utterances and sketches" [Dataset]. http://doi.org/10.21954/ou.rd.30257695.v1
    Explore at:
    qtAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    The Open University
    Authors
    Jonathan P. San Diego
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises the files contained on a CD-ROM which was attached to the thesis when it was submitted in 2008. It was uploaded to ORDO in 2025 for preservation purposes. For more information, please refer to the thesis "The effect on learners strategies of varying computer-based representations: evidence from gazes, actions, utterances and sketches" on OROThesis AbstractComputer-based Multiple External Representations (MERs) have been found in some cases to help and in others to hinder the learning process. This thesis examines how varying the external representations that are presented in a computer environment influences the strategies that learners choose when tackling mathematics tasks. It has been noted (Ainsworth, 2006) that learners fail to transfer insights from one representation to another. Previous work analysing video data of learners' problem-solving with computer-based MERs emphasises the need to identify which representation is being considered by a learner as utterances are made, and to examine more closely learners' movement between representations. This research focuses on the relationship between strategy and representation during learners' problem solving.A set of analytical techniques was developed to characterise learner strategies, to identify how different computer-based MERs influence strategy choices, and to explore how these choices change over the course of task completion. Rich data were collected using a variety of technologies: learners' shifts in attention were recorded using an unobtrusive eye-tracking device and screen capture software; keyboard and mouse actions were logged automatically; utterances and gestures were video recorded; notes and sketches were recorded in real-time using a Tablet PC. This research suggests how integrated analysis of learners' gazes, actions, writing, sketches and utterances can better illuminate subtle cognitive strategies.The study involved completion of three tasks by eighteen participants using multiple mathematical representations (numbers, graphs and algebra) presented in different computer-based 'instantiations': Static (non-moving, non-changing, non-Interactive); Dynamic (capable of animation following keyboard inputs); Interactive (directly manipulable using a mouse).Having computer-based MERs available to learners provides an opportunity to use representations with which they are comfortable. A detailed analysis showed that both representation and instantiation have an impact on strategy choice. It identified differences in expression of inferences, construction of visual images, and attention to representations between different types of instantiation. One of the important findings of the research is that learners are less likely to use imagining strategies when representational instantiation is Interactive. These results may provide some explanation of how interactivity helps or hinders learners' understanding of multiple representations.

  16. Student Performance

    • kaggle.com
    zip
    Updated Oct 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). Student Performance [Dataset]. https://www.kaggle.com/datasets/whenamancodes/student-performance
    Explore at:
    zip(106753 bytes)Available download formats
    Dataset updated
    Oct 7, 2022
    Authors
    Aman Chauhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

    Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

    ColumnsDescription
    schoolstudent's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
    sexstudent's sex (binary: 'F' - female or 'M' - male)
    agestudent's age (numeric: from 15 to 22)
    addressstudent's home address type (binary: 'U' - urban or 'R' - rural)
    famsizefamily size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
    Pstatusparent's cohabitation status (binary: 'T' - living together or 'A' - apart)
    Medumother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Fedufather's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Mjobmother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    Fjobfather's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    reasonreason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
    guardianstudent's guardian (nominal: 'mother', 'father' or 'other')
    traveltimehome to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
    studytimeweekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
    failuresnumber of past class failures (numeric: n if 1<=n<3, else 4)
    schoolsupextra educational support (binary: yes or no)
    famsupfamily educational support (binary: yes or no)
    paidextra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
    activitiesextra-curricular activities (binary: yes or no)
    nurseryattended nursery school (binary: yes or no)
    higherwants to take higher education (binary: yes or no)
    internetInternet access at home (binary: yes or no)
    romanticwith a romantic relationship (binary: yes or no)
    famrelquality of family relationships (numeric: from 1 - very bad to 5 - excellent)
    freetimefree time after school (numeric: from 1 - very low to 5 - very high)
    gooutgoing out with friends (numeric: from 1 - very low to 5 - very high)
    Dalcworkday alcohol consumption (numeric: from 1 - very low to 5 - very high)
    Walcweekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
    healthcurrent health status (numeric: from 1 - very bad to 5 - very good)
    absencesnumber of school absences (numeric: from 0 to 93)

    These grades are related with the course subject, Math or Portuguese:

    GradeDescription
    G1first period grade (numeric: from 0 to 20)
    G2second period grade (numeric: from 0 to 20)
    G3final grade (numeric: from 0 to 20, output target)

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

  17. A Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update)...

    • nist.gov
    • data.nist.gov
    • +2more
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). A Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update) Associated Python Scripts [Dataset]. http://doi.org/10.18434/mds2-3488
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Area covered
    United States
    Description

    This is a compilation of Python scripts used when developing the Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update) NIST TN.

  18. D

    Data from: Rekenen in Beeld

    • ssh.datastations.nl
    csv, pdf, xlsx, zip
    Updated Dec 31, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K. Hoogland; K. Hoogland (2013). Rekenen in Beeld [Dataset]. http://doi.org/10.17026/DANS-ZA6-5Q6C
    Explore at:
    pdf(393474), pdf(1550665), csv(4094), zip(26355), pdf(1976198), xlsx(29737071), csv(17011324), pdf(1671241)Available download formats
    Dataset updated
    Dec 31, 2013
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    K. Hoogland; K. Hoogland
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The project is conducted under the programe Onderwijsbewijs II (Evidence based Research in Education). The research question is: What is the effect on students' performance of changing the representation of the problem situation in mathematical contextual problems. In this randomized controlled experiment the representaion of the problem situation in 24 items is systematically changed form descriptive to depictive. In each test half of the items is randomly assigned one repesentation or the other.The article "Word Problems versus Image-Rich Problems: Changing the Representation of Reality in Contextual Mathematical Problems" reports on this research.---Het project is een onderzoek in het programma Onderwijsbewijs II, dat zich richt op "evidence-based" onderzoek naar onderwijspraktijken. In dit onderzoek wordt systematisch gekeken wat het effect is op de resultaten van leerlingen als in contextopgaven bij rekenen de verhalende representatie van de probleemsituatie systematisch wordt vervangen door een beeldende representatie van de probleemsituatie. In de ontwikkelde toets krijgen de deelnemers random de helft van de toets opgaven met een beschrijvende representatie van de probleemsituatie en de helft van de toets met en beeldende representatie. Dit onderzoeksontwerp voldoet aan de eisen van een "randomized controled trial".Het artikel "Word Problems versus Image-Rich Problems: Changing the Representation of Reality in Contextual Mathematical Problems" rapporteert over dit onderzoek. The code book is in English and in Dutch.

  19. e

    Mathematics and Visualization - distribution

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Mathematics and Visualization - distribution [Dataset]. https://exaly.com/journal/33600/mathematics-and-visualization
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The graph shows the number of publications with a given number of citations of ^. The data are presented on a logarithmic scale for the sake of clarity.

  20. H

    Math Self-beliefs comparison study

    • dataverse.harvard.edu
    • dataone.org
    Updated Oct 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandra J. Miles (2022). Math Self-beliefs comparison study [Dataset]. http://doi.org/10.7910/DVN/0SJCH2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Sandra J. Miles
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The purpose of this data set is to examine distinctions across different measurement instruments used to measure student self-perceptions and attitudes towards mathematics. 225 college undergraduate students took one randomized survey containing the following scales: 1) Mathematics self-efficacy and Anxiety Questionnaire (MSEAQ, May, 2009), 2) Self-Descriptive Questionnaire-III (SDQ-III, Marsh & O’Neill, 1984) [All 10 items for the math self-concept subset were included in this survey. The other SDQ items represent incomplete question sets], 3) Mathematical Self-efficacy Scale – Middle School (UPMSES, Usher & Pajares, 2009), 4) Mathematical Self-Concept Scale (GMSCS, Gourgey, 1982), 5) Nine items from the Fennema Sherman Mathematics Attitude Scales (FSMAS, Mulhern & Rae, 1998)[These items do not represent the entire scale as developed], and 6) Three original items measuring student beliefs related to seeking help in mathematics classes. The survey originally included two items that were specific to the undergraduate course the students were currently enrolled in, but they have been removed from the dataset. Additionally, for a subset of the students the dataset includes information on gender, race, major (STEM vs. nonSTEM), course grade, course exam average, and homework completion percentage. The separate scales were all compiled into one survey programmed in Qualtrics. Questions were randomized within scales and scales were randomized within the entire survey. The survey was available for a one-week time period during which participants could complete it at their leisure. Qualtrics allowed multiple entries so that the entire survey could be completed in multiple settings. The data reporting demographic and course information was obtained through school records.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
Organization logo

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

Search
Clear search
Close search
Google apps
Main menu