64 datasets found

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...
scielo.figshare.com
datasetcatalog.nlm.nih.gov
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6857033.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.
Mathematical Problems Dataset: Various
kaggle.com
zip
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mathematical Problems Dataset: Various [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathematical-problems-dataset-various-mathematic/code
Explore at:
zip(2498203187 bytes)Available download formats
Dataset updated
Dec 2, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

By math_dataset (From Huggingface) [source]

About this dataset

This dataset comprises a collection of mathematical problems and their solutions designed for training and testing purposes. Each problem is presented in the form of a question, followed by its corresponding answer. The dataset covers various mathematical topics such as arithmetic, polynomials, and prime numbers. For instance, the arithmetic_nearest_integer_root_test.csv file focuses on problems involving finding the nearest integer root of a given number. Similarly, the polynomials_simplify_power_test.csv file deals with problems related to simplifying polynomials with powers. Additionally, the dataset includes the numbers_is_prime_train.csv file containing math problems that require determining whether a specific number is prime or not. The questions and answers are provided in text format to facilitate analysis and experimentation with mathematical problem-solving algorithms or models

How to use the dataset

Introduction: The Mathematical Problems Dataset contains a collection of various mathematical problems and their corresponding solutions or answers. This guide will provide you with all the necessary information on how to utilize this dataset effectively.

Understanding the columns: The dataset consists of several columns, each representing a different aspect of the mathematical problem and its solution. The key columns are:

question: This column contains the text representation of the mathematical problem or equation.

answer: This column contains the text representation of the solution or answer to the corresponding problem.

Exploring specific problem categories: To focus on specific types of mathematical problems, you can filter or search within the dataset using relevant keywords or terms related to your area of interest. For example, if you are interested in prime numbers, you can search for prime in the question column.

Applying machine learning techniques: This dataset can be used for training machine learning models related to natural language understanding and mathematics. You can explore various techniques such as text classification, sentiment analysis, or even sequence-to-sequence models for solving mathematical problems based on their textual representations.

Generating new questions and solutions: By analyzing patterns in this dataset, you can generate new questions and solutions programmatically using techniques like data augmentation or rule-based methods.

Validation and evaluation: As with any other machine learning task, it is essential to validate your models on separate validation sets not included in this dataset properly. You can also evaluate model performance by comparing predictions against known answers provided in this dataset's answer column.

Sharing insights and findings: After working with this datasets, it would be beneficial for researchers or educators to share their insights, approaches taken during analysis/modelling as Kaggle notebooks/ discussions/ blogs/ tutorials etc., so that others could get benefited from such shared resources too.

Note: Please note that the dataset does not include dates.

By following these guidelines, you can effectively explore and utilize the Mathematical Problems Dataset for various mathematical problem-solving tasks. Happy exploring!

Research Ideas

Developing machine learning algorithms for solving mathematical problems: This dataset can be used to train and test models that can accurately predict the solution or answer to different mathematical problems.

Creating educational resources: The dataset can be used to create a wide variety of educational materials such as problem sets, worksheets, and quizzes for students studying mathematics.

Research in mathematical problem-solving strategies: Researchers and educators can analyze the dataset to identify common patterns or strategies employed in solving different types of mathematical problems. This analysis can help improve teaching methodologies and develop effective problem-solving techniques

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purpos...
collatz-conjecture-10k
kaggle.com
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taekyung (2025). collatz-conjecture-10k [Dataset]. https://www.kaggle.com/datasets/taeryy/collatz-conjecture-10k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 8, 2025
Dataset provided by
Kaggle
Authors
Taekyung
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
🔢 The collatz conjecture

n is even number, Divide by 2.

n is odd number, multiply by 3 and add 1.

repeat, See if it goes to 1.

Does it apply to all natural numbers?

This concise problem has not yet been proven. This problem is also famous for its beautiful graphs.

📂 Dataset Structure

File: col_10k.txt Each line contains one Collatz sequence (comma-separated integers).
Example row: 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 Number of sequences: 10,000

💻 Starter Notebook

https://www.kaggle.com/code/taeryy/use-collatz-sequence-data

📜 License

CC-BY 4.0 Feel free to use this dataset for any purpose, just credit the source: Created by Taery (Kaggle: taeryy)
Named Math Formulas
kaggle.com
huggingface.co
zip
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marília Prata (2023). Named Math Formulas [Dataset]. https://www.kaggle.com/datasets/mpwolke/cusersmarildownloadsdata-json/code
Explore at:
zip(19910 bytes)Available download formats
Dataset updated
Dec 30, 2023
Authors
Marília Prata
Description
"Mathematical dataset based on 71 famous mathematical identities. Each entry consists of a name of the identity (name), a representation of that identity (formula), a label whether the representation belongs to the identity (label), and an id of the mathematical identity (formula_name_id). The false pairs are intentionally challenging, e.g., a^2+2^b=c^2as falsified version of the Pythagorean Theorem. All entries have been generated by using data.json as starting point and applying the randomizing and falsifying algorithms here. The formulas in the dataset are not just pure mathematical, but contain also textual descriptions of the mathematical identity. At most 400000 versions are generated per identity. There are ten times more falsified versions than true ones, such that the dataset can be used for a training with changing false examples every epoch."

https://huggingface.co/datasets/ddrg/named_math_formulas
h
MathVista
huggingface.co
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI for Math Reasoning (2023). MathVista [Dataset]. https://huggingface.co/datasets/AI4Math/MathVista
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 16, 2023
Dataset authored and provided by
AI for Math Reasoning
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for MathVista

Dataset Description Paper Information Dataset Examples Leaderboard Dataset Usage Data Downloading Data Format Data Visualization Data Source Automatic Evaluation

License Citation

Dataset Description

MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical… See the full description on the dataset page: https://huggingface.co/datasets/AI4Math/MathVista.
Math Formula Retrieval
kaggle.com
huggingface.co
zip
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Math Formula Retrieval [Dataset]. https://www.kaggle.com/datasets/thedevastator/math-formula-pair-classification-dataset/data
Explore at:
zip(2021716728 bytes)Available download formats
Dataset updated
Dec 2, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Math Formula Retrieval

Math Formula Pair Classification Dataset

By ddrg (From Huggingface) [source]

About this dataset

With a total of six columns, including formula1, formula2, label (binary format), formula1, formula2, and label, the dataset provides all the necessary information for conducting comprehensive analysis and evaluation.

The train.csv file contains a subset of the dataset specifically curated for training purposes. It includes an extensive range of math formula pairs along with their corresponding labels and unique ID names. This allows researchers and data scientists to construct models that can predict whether two given formulas fall within the same category or not.

On the other hand, test.csv serves as an evaluation set. It consists of additional pairs of math formulas accompanied by their respective labels and unique IDs. By evaluating model performance on this test set after training it on train.csv data, researchers can assess how well their models generalize to unseen instances.

By leveraging this informative dataset, researchers can unlock new possibilities in mathematics-related fields such as pattern recognition algorithms development or enhancing educational tools that involve automatic identification and categorization tasks based on mathematical formulas

How to use the dataset

Introduction

Dataset Description

train.csv

The train.csv file contains a set of labeled math formula pairs along with their corresponding labels and formula name IDs. It consists of the following columns: - formula1: The first mathematical formula in the pair (text). - formula2: The second mathematical formula in the pair (text). - label: The classification label indicating whether the pair of formulas belong to the same category or not (binary). A label value of 1 indicates that both formulas belong to the same category, while a label value of 0 indicates different categories.

test.csv

The purpose of the test.csv file is to provide a set of formula pairs along with their labels and formula name IDs for testing and evaluation purposes. It has an identical structure to train.csv, containing columns like formula1, formula2, label, etc.

Task

The main task using this dataset is binary classification, where your objective is to predict whether two mathematical formulas belong to the same category or not based on their textual representation. You can use various machine learning algorithms such as logistic regression, decision trees, random forests, or neural networks for training models on this dataset.

Exploring & Analyzing Data

Before building your model, it's crucial to explore and analyze your data. Here are some steps you can take:

Load both CSV files (train.csv and test.csv) into your preferred data analysis framework or programming language (e.g., Python with libraries like pandas).

Examine the dataset's structure, including the number of rows, columns, and data types.

Check for missing values in the dataset and handle them accordingly.

Visualize the distribution of labels to understand whether it is balanced or imbalanced.

Model Building

Once you have analyzed and preprocessed your dataset, you can start building your classification model using various machine learning algorithms:

Split your train.csv data into training and validation sets for model evaluation during training.

Choose a suitable

Research Ideas

Math Formula Similarity: This dataset can be used to develop a model that classifies whether two mathematical formulas are similar or not. This can be useful in various applications such as plagiarism detection, identifying duplicate formulas in databases, or suggesting similar formulas based on user input.

Formula Categorization: The dataset can be used to train a model that categorizes mathematical formulas into different classes or categories. For example, the model can classify formulas into algebraic expressions, trigonometric equations, calculus problems, or geometric theorems. This categorization can help organize and search through large collections of mathematical formulas.

Formula Recommendation: Using this dataset, one could build a recommendation system that suggests related math formulas based on user input. By analyzing the similarities between different formula pairs and their corresponding labels, the system could provide recommendations for relevant mathematical concepts that users may need while solving problems or studying specific topics in mathematics

Acknowle...
GSM8K - Grade School Math 8K Q&A
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
Explore at:
zip(3418660 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

By Huggingface Hub [source]

About this dataset

This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

Research Ideas

Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.

Generating new grade school math questions and answers using g...
Z
Data from: Algorithm and System Co-design for Efficient Subgraph-based Graph...
data.niaid.nih.gov
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan (2025). Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_15186012
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Cornell University
Peking University
Purdue University West Lafayette
Authors
Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Following the format of the Open Graph Benchmark (OGB), we design four prediction tasks of relations (mag-write, mag-cite) and higher-order patterns (tags-math, DBLP-coauthor) and construct the corresponding datasets over heterogeneous graphs and hypergraphs [1]. The original ogb-mag dataset only contains features for 'paper'-type nodes. We add the node embedding provided by [2] as raw features for other node types in MAG(P-A)/(P-P). For these four tasks, the model is evaluated by one positive query paired with a certain number of randomly sampled negative queries (1:1000 by default, except for tags-math 1:100).
Model comparison.
plos.figshare.com
xls
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Park; George Sugihara; Gerald Pao (2024). Model comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0305408.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305408.t001
Dataset updated
Aug 1, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Joseph Park; George Sugihara; Gerald Pao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effective control requires knowledge of the process dynamics to guide the system toward desired states. In many control applications this knowledge is expressed mathematically or through data–driven models, however, as complexity grows obtaining a satisfactory mathematical representation is increasingly difficult. Further, many data–driven approaches consist of abstract internal representations that may have no obvious connection to the underlying dynamics and control, or, require extensive model design and training. Here, we remove these constraints by demonstrating model predictive control from generalized state space embedding of the process dynamics providing a data–driven, explainable method for control of nonlinear, complex systems. Generalized embedding and model predictive control are demonstrated on nonlinear dynamics generated by an agent based model of 1200 interacting agents. The method is generally applicable to any type of controller and dynamic system representable in a state space.
f
Data from: A brief introduction to nomography: graphical representation of...
tandf.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leslie Glasser; Ron Doerfler (2023). A brief introduction to nomography: graphical representation of mathematical relationships [Dataset]. http://doi.org/10.6084/m9.figshare.7139414.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7139414.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Leslie Glasser; Ron Doerfler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nomographs (or nomograms, or alignment charts) are graphical representations of mathematical relationships (extending to empirical relationships of data) which are used by simply applying a straightedge across the plot through points on scales representing independent variables, which then crosses the corresponding datum point for the dependent variable; the choice among independent and dependent variable is arbitrary so that each variable may be determined in terms of the others. Examples of nomographs in common current use compute the lift available for a hot-air balloon, the boiling points of solvents under reduced pressure in the chemistry laboratory, and the relative forces in a centrifuge in a biochemical laboratory. Sundials represent another ancient yet widely familiar example. With the advent and ready accessibility of the computer, printed mathematical tables, slide rules and nomographs became generally redundant. However, nomographs provide insight into mathematical relationships, are useful for rapid and repeated application, even in the absence of calculational facilities, and can reliably be used in the field. Many nomographs for various purposes may be found online. This paper describes the origins and development of nomographs, illustrating their use with some relevant examples. A supplementary interactive Excel file demonstrates their application for some simple mathematical operations.

Global Math Calculation Software Market Research Report: By Application...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Math Calculation Software Market Research Report: By Application (Education, Engineering, Finance, Data Analysis), By Deployment Type (Cloud-Based, On-Premises), By End User (Students, Professionals, Educational Institutions, Businesses), By Features (Graphing Capabilities, Statistical Analysis, Equation Solving, Simulation) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/math-calculation-software-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	2113.7(USD Million)
MARKET SIZE 2025	2263.7(USD Million)
MARKET SIZE 2035	4500.0(USD Million)
SEGMENTS COVERED	Application, Deployment Type, End User, Features, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	increasing demand for automation, rise in online education, growth in data analysis tools, adoption of cloud-based solutions, emphasis on STEM education
MARKET FORECAST UNITS	USD Million
KEY COMPANIES PROFILED	IBM, Oracle, Maplesoft, Algebraix, MathWorks, Tableau, SAP, PTC, Microsoft, Wolfram Research, ESRI, SAS Institute
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	AI integration in math tools, Mobile-friendly calculation apps, Enhanced data visualization features, Cloud-based collaboration solutions, Gamification of math learning
COMPOUND ANNUAL GROWTH RATE (CAGR)	7.1% (2025 - 2035)

H
Data from: Zero Data Exposure: A New Framework for Enabling Generative AI on...
dataverse.harvard.edu
search.dataone.org
Updated Oct 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanuj Boruah (2025). Zero Data Exposure: A New Framework for Enabling Generative AI on Private Enterprise Data [Dataset]. http://doi.org/10.7910/DVN/FZMD31
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FZMD31
Dataset updated
Oct 28, 2025
Dataset provided by
Harvard Dataverse
Authors
Priyanuj Boruah
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This research addresses the central security challenge preventing the adoption of Generative AI in the enterprise: the unacceptable risk of exposing sensitive data. It introduces a formal framework for evaluating privacy-preserving AI strategies along three critical axes: (1) Security Guarantee Level (SGL), a measure of theoretical data privacy; (2) Contextual Fidelity, the analytical utility of the private data representation; and (3) Performance Overhead, the computational cost. This framework serves as a definitive guide for CTOs, CISOs, and data leaders to make evidence-based decisions, moving beyond simplistic assessments to a holistic, security-first evaluation of any proposed AI solution. Applying this framework to four distinct classes of methods, from simple statistical summaries to complex generative models, our analysis conclusively demonstrates that all common public approaches are fundamentally flawed for high-stakes enterprise use. It proves that methods offering the highest security guarantees (SGL-1) are analytically weak, while the only method capable of high fidelity (Generative Models) is architecturally insecure (SGL-2) and catastrophically slow. This paper formally identifies this critical "research gap" and establishes a clear, rigorous benchmark for the new class of solution required to unlock the full potential of AI on private data. This dataset contains the full PDF of the paper and a machine-readable CSV file of the analytical results.
Marking scheme for representation test and mathematical problem solving.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Putri Yuanita; Hutkemri Zulnaidi; Effandi Zakaria (2023). Marking scheme for representation test and mathematical problem solving. [Dataset]. http://doi.org/10.1371/journal.pone.0204847.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0204847.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Putri Yuanita; Hutkemri Zulnaidi; Effandi Zakaria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Marking scheme for representation test and mathematical problem solving.
w
Dataset of books called Ontic : a knowledge representation system for...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Ontic : a knowledge representation system for mathematics [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Ontic+%3A+a+knowledge+representation+system+for+mathematics
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Ontic : a knowledge representation system for mathematics. It features 7 columns including author, publication date, language, and book publisher.
o
Supplementary materials for the PhD thesis "The effect on learners...
ordo.open.ac.uk
qt
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan P. San Diego (2025). Supplementary materials for the PhD thesis "The effect on learners strategies of varying computer-based representations: evidence from gazes, actions, utterances and sketches" [Dataset]. http://doi.org/10.21954/ou.rd.30257695.v1
Explore at:
qtAvailable download formats
Unique identifier
https://doi.org/10.21954/ou.rd.30257695.v1
Dataset updated
Oct 1, 2025
Dataset provided by
The Open University
Authors
Jonathan P. San Diego
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises the files contained on a CD-ROM which was attached to the thesis when it was submitted in 2008. It was uploaded to ORDO in 2025 for preservation purposes. For more information, please refer to the thesis "The effect on learners strategies of varying computer-based representations: evidence from gazes, actions, utterances and sketches" on OROThesis AbstractComputer-based Multiple External Representations (MERs) have been found in some cases to help and in others to hinder the learning process. This thesis examines how varying the external representations that are presented in a computer environment influences the strategies that learners choose when tackling mathematics tasks. It has been noted (Ainsworth, 2006) that learners fail to transfer insights from one representation to another. Previous work analysing video data of learners' problem-solving with computer-based MERs emphasises the need to identify which representation is being considered by a learner as utterances are made, and to examine more closely learners' movement between representations. This research focuses on the relationship between strategy and representation during learners' problem solving.A set of analytical techniques was developed to characterise learner strategies, to identify how different computer-based MERs influence strategy choices, and to explore how these choices change over the course of task completion. Rich data were collected using a variety of technologies: learners' shifts in attention were recorded using an unobtrusive eye-tracking device and screen capture software; keyboard and mouse actions were logged automatically; utterances and gestures were video recorded; notes and sketches were recorded in real-time using a Tablet PC. This research suggests how integrated analysis of learners' gazes, actions, writing, sketches and utterances can better illuminate subtle cognitive strategies.The study involved completion of three tasks by eighteen participants using multiple mathematical representations (numbers, graphs and algebra) presented in different computer-based 'instantiations': Static (non-moving, non-changing, non-Interactive); Dynamic (capable of animation following keyboard inputs); Interactive (directly manipulable using a mouse).Having computer-based MERs available to learners provides an opportunity to use representations with which they are comfortable. A detailed analysis showed that both representation and instantiation have an impact on strategy choice. It identified differences in expression of inferences, construction of visual images, and attention to representations between different types of instantiation. One of the important findings of the research is that learners are less likely to use imagining strategies when representational instantiation is Interactive. These results may provide some explanation of how interactivity helps or hinders learners' understanding of multiple representations.

Student Performance

kaggle.com

zip

Updated Oct 7, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Student Performance [Dataset]. https://www.kaggle.com/datasets/whenamancodes/student-performance

Explore at:

zip(106753 bytes)Available download formats

Dataset updated

Oct 7, 2022

Authors

Aman Chauhan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

Columns	Description
school	student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
sex	student's sex (binary: 'F' - female or 'M' - male)
age	student's age (numeric: from 15 to 22)
address	student's home address type (binary: 'U' - urban or 'R' - rural)
famsize	family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
Pstatus	parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
Medu	mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Fedu	father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Mjob	mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
Fjob	father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
reason	reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
guardian	student's guardian (nominal: 'mother', 'father' or 'other')
traveltime	home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime	weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures	number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup	extra educational support (binary: yes or no)
famsup	family educational support (binary: yes or no)
paid	extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
activities	extra-curricular activities (binary: yes or no)
nursery	attended nursery school (binary: yes or no)
higher	wants to take higher education (binary: yes or no)
internet	Internet access at home (binary: yes or no)
romantic	with a romantic relationship (binary: yes or no)
famrel	quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime	free time after school (numeric: from 1 - very low to 5 - very high)
goout	going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc	workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc	weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health	current health status (numeric: from 1 - very bad to 5 - very good)
absences	number of school absences (numeric: from 0 to 93)

These grades are related with the course subject, Math or Portuguese:

Grade	Description
G1	first period grade (numeric: from 0 to 20)
G2	second period grade (numeric: from 0 to 20)
G3	final grade (numeric: from 0 to 20, output target)

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

A Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update)...
nist.gov
data.nist.gov
+2more
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). A Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update) Associated Python Scripts [Dataset]. http://doi.org/10.18434/mds2-3488
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-3488, https://identifiers.org/ark:/88434/mds2-3488
Dataset updated
Jun 3, 2024
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Area covered
United States
Description
This is a compilation of Python scripts used when developing the Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update) NIST TN.
D
Data from: Rekenen in Beeld
ssh.datastations.nl
csv, pdf, xlsx, zip
Updated Dec 31, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K. Hoogland; K. Hoogland (2013). Rekenen in Beeld [Dataset]. http://doi.org/10.17026/DANS-ZA6-5Q6C
Explore at:
pdf(393474), pdf(1550665), csv(4094), zip(26355), pdf(1976198), xlsx(29737071), csv(17011324), pdf(1671241)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZA6-5Q6C
Dataset updated
Dec 31, 2013
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
K. Hoogland; K. Hoogland
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The project is conducted under the programe Onderwijsbewijs II (Evidence based Research in Education). The research question is: What is the effect on students' performance of changing the representation of the problem situation in mathematical contextual problems. In this randomized controlled experiment the representaion of the problem situation in 24 items is systematically changed form descriptive to depictive. In each test half of the items is randomly assigned one repesentation or the other.The article "Word Problems versus Image-Rich Problems: Changing the Representation of Reality in Contextual Mathematical Problems" reports on this research.---Het project is een onderzoek in het programma Onderwijsbewijs II, dat zich richt op "evidence-based" onderzoek naar onderwijspraktijken. In dit onderzoek wordt systematisch gekeken wat het effect is op de resultaten van leerlingen als in contextopgaven bij rekenen de verhalende representatie van de probleemsituatie systematisch wordt vervangen door een beeldende representatie van de probleemsituatie. In de ontwikkelde toets krijgen de deelnemers random de helft van de toets opgaven met een beschrijvende representatie van de probleemsituatie en de helft van de toets met en beeldende representatie. Dit onderzoeksontwerp voldoet aan de eisen van een "randomized controled trial".Het artikel "Word Problems versus Image-Rich Problems: Changing the Representation of Reality in Contextual Mathematical Problems" rapporteert over dit onderzoek. The code book is in English and in Dutch.
e
Mathematics and Visualization - distribution
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Mathematics and Visualization - distribution [Dataset]. https://exaly.com/journal/33600/mathematics-and-visualization
Explore at:
json, csvAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The graph shows the number of publications with a given number of citations of ^. The data are presented on a logarithmic scale for the sake of clarity.
H
Math Self-beliefs comparison study
dataverse.harvard.edu
dataone.org
Updated Oct 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra J. Miles (2022). Math Self-beliefs comparison study [Dataset]. http://doi.org/10.7910/DVN/0SJCH2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0SJCH2
Dataset updated
Oct 7, 2022
Dataset provided by
Harvard Dataverse
Authors
Sandra J. Miles
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The purpose of this data set is to examine distinctions across different measurement instruments used to measure student self-perceptions and attitudes towards mathematics. 225 college undergraduate students took one randomized survey containing the following scales: 1) Mathematics self-efficacy and Anxiety Questionnaire (MSEAQ, May, 2009), 2) Self-Descriptive Questionnaire-III (SDQ-III, Marsh & O’Neill, 1984) [All 10 items for the math self-concept subset were included in this survey. The other SDQ items represent incomplete question sets], 3) Mathematical Self-efficacy Scale – Middle School (UPMSES, Usher & Pajares, 2009), 4) Mathematical Self-Concept Scale (GMSCS, Gourgey, 1982), 5) Nine items from the Fennema Sherman Mathematics Attitude Scales (FSMAS, Mulhern & Rae, 1998)[These items do not represent the entire scale as developed], and 6) Three original items measuring student beliefs related to seeking help in mathematics classes. The survey originally included two items that were specific to the undergraduate course the students were currently enrolled in, but they have been removed from the dataset. Additionally, for a subset of the students the dataset includes information on gender, race, major (STEM vs. nonSTEM), course grade, course exam average, and homework completion percentage. The separate scales were all compiled into one survey programmed in Qualtrics. Questions were randomized within scales and scales were randomized within the entire survey. The survey was available for a one-week time period during which participants could complete it at their leisure. Qualtrics allowed multiple entries so that the entire survey could be completed in multiple settings. The data reporting demographic and course information was obtained through school records.

Facebook

Twitter

Click to copy link

Link copied

Cite

Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6857033.v1

Dataset updated

May 30, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

Clear search

Close search

Google apps

Main menu

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...

Mathematical Problems Dataset: Various

Mathematical Problems Dataset: Various Mathematical Problems and Solutions

Mathematical Problems Dataset: Questions and Answers

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

collatz-conjecture-10k

🔢 The collatz conjecture

📂 Dataset Structure

💻 Starter Notebook

📜 License

Named Math Formulas

MathVista

Math Formula Retrieval

Math Formula Retrieval

Math Formula Pair Classification Dataset

About this dataset

How to use the dataset

Introduction

Dataset Description

train.csv

test.csv

Task

Exploring & Analyzing Data

Model Building

Research Ideas

Acknowle...

GSM8K - Grade School Math 8K Q&A

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Data from: Algorithm and System Co-design for Efficient Subgraph-based Graph...

Model comparison.

Data from: A brief introduction to nomography: graphical representation of...

Global Math Calculation Software Market Research Report: By Application...

Data from: Zero Data Exposure: A New Framework for Enabling Generative AI on...

Marking scheme for representation test and mathematical problem solving.

Dataset of books called Ontic : a knowledge representation system for...

Supplementary materials for the PhD thesis "The effect on learners...

Student Performance

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

These grades are related with the course subject, Math or Portuguese:

A Collection of Dwellings to Represent the U.S. Housing Stock (2024 Update)...

Data from: Rekenen in Beeld

Mathematics and Visualization - distribution

Math Self-beliefs comparison study

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú