100+ datasets found
  1. Data from: English and maths

    • gov.uk
    Updated Nov 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2019). English and maths [Dataset]. https://www.gov.uk/government/statistical-data-sets/fe-data-library-skills-for-life
    Explore at:
    Dataset updated
    Nov 28, 2019
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Education
    Description

    English and maths (formerly Skills for Life) qualifications are designed to give people the reading, writing, maths and communication skills they need in everyday life, to operate effectively in work and to help them succeed on other training courses.

    These data provide information on participation and achievements for English and maths qualifications and are broken down into a number of key reports.

    Can’t find what you’re looking for?

    If you need help finding data please refer to the table finder tool to search for specific breakdowns available for FE statistics.

    Current data

    https://assets.publishing.service.gov.uk/media/5f0c5c923a6f4003935c2c6f/201819-Nov_EandM_Part_and_Achieve.xlsx">English and maths data tool for participation and achievements 2018/19

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">10.9 MB</span></p>
    
    
    
    
     <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
     <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
    

    Request an accessible format.

      If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternative.formats@education.gov.uk" target="_blank" class="govuk-link">alternative.formats@education.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
    

    Archive

  2. Mathematics Dataset

    • github.com
    • opendatalab.com
    • +1more
    Updated Apr 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
    Explore at:
    Dataset updated
    Apr 3, 2019
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Description

    This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

    ## Example questions

     Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
     Answer: 4
     
     Question: Calculate -841880142.544 + 411127.
     Answer: -841469015.544
     
     Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
     Answer: 54*a - 30
    

    It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

    • algebra (linear equations, polynomial roots, sequences)
    • arithmetic (pairwise operations and mixed expressions, surds)
    • calculus (differentiation)
    • comparison (closest numbers, pairwise comparisons, sorting)
    • measurement (conversion, working with time)
    • numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
    • polynomials (addition, simplification, composition, evaluating, expansion)
    • probability (sampling without replacement)
  3. Math-Students Performance Data

    • kaggle.com
    zip
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Shamim (2025). Math-Students Performance Data [Dataset]. https://www.kaggle.com/datasets/adilshamim8/math-students
    Explore at:
    zip(7367 bytes)Available download formats
    Dataset updated
    Apr 2, 2025
    Authors
    Adil Shamim
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    About the Math-Students Dataset

    This dataset, originally sourced from the UCI Machine Learning Repository, offers a rich collection of data on student performance in a math program. It provides detailed insights into both the academic achievements and the socio-demographic backgrounds of the students, making it an excellent resource for educational data mining and predictive analytics.

    Key Features & Attributes

    • Demographics & Background:

      • School: Identifies the student's school (e.g., Gabriel Pereira or Mousinho da Silveira).
      • Sex & Age: Basic demographic information to help explore performance trends among different groups.
      • Address & Family Size: Details about the student’s home environment, including whether they live in an urban or rural area and their family size.
    • Parental & Household Information:

      • Parental Cohabitation & Education: Data on whether parents live together and their education levels, which can correlate with student support and academic outcomes.
      • Parental Occupation: Information on the mother’s and father’s jobs, providing further context on socioeconomic factors.
    • Educational & Behavioral Variables:

      • Study Time & Failures: Weekly study time and history of past class failures help gauge academic dedication and potential challenges.
      • Support & Extracurricular Activities: Records on whether the student has received extra educational support or participates in extracurricular activities, which can influence overall performance.
      • School-Related Factors: Travel time to school, attendance (absences), and participation in additional paid classes contribute to a holistic view of the educational environment.
    • Lifestyle & Social Factors:

      • Internet Access, Free Time & Socializing: Variables like internet availability, free time, and how often students go out with friends help capture lifestyle and behavioral patterns.
      • Health & Well-being: Self-reported health status and alcohol consumption patterns during weekdays and weekends offer insights into personal well-being, which may impact academic performance.
    • Academic Performance:

      • Grades: The dataset includes three key assessments—G1 (first period grade), G2 (second period grade), and G3 (final grade). G3, the final grade, serves as the primary target variable for predictive models.

    Potential Applications

    • Predictive Modeling:
      Researchers and data scientists can build regression models to predict final grades (G3) based on the numerous socio-demographic and educational features.
    • Exploratory Data Analysis:
      The dataset is ideal for exploring relationships between family background, lifestyle choices, and academic success. For example, one could analyze how study time or parental education levels correlate with performance.
    • Educational Interventions:
      By identifying key factors that contribute to academic outcomes, educators and policymakers can develop targeted interventions to support at-risk students.
    • Comparative Studies:
      While this dataset focuses on math scores, its structure is similar to the Portuguese language course dataset. This similarity provides opportunities for cross-domain comparisons in educational research.

    Additional Insights

    • Data Complexity & Quality:
      Despite its moderate size, the dataset is rich in both categorical and numerical variables. This diversity requires careful preprocessing and feature engineering but also offers the chance to uncover complex interactions between various factors.
    • Research Impact:
      The dataset has been widely used in the field of educational data mining. Its comprehensive nature has provided a basis for numerous studies examining the interplay between academic performance and a range of external factors.
    • Historical Context:
      Originating from a study presented at the 5th FUBUTEC 2008 conference, the dataset has contributed valuable insights into secondary school performance and continues to serve as a benchmark for educational analytics research.
  4. A level and other 16 to 18 results - English and maths progress -...

    • explore-education-statistics.service.gov.uk
    Updated Nov 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2021). A level and other 16 to 18 results - English and maths progress - institution type and gender [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/44999f80-8a3b-4f80-8c17-d1b56df37df0
    Explore at:
    Dataset updated
    Nov 4, 2021
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    English and maths progress, by institution type and student gender.

  5. MetaMath QA

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
    Explore at:
    zip(78629842 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  6. i

    Trends in International Mathematics and Science Study 2007 - Armenia,...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIMSS International Study Center (2022). Trends in International Mathematics and Science Study 2007 - Armenia, Australia, Austria...and 55 more [Dataset]. https://datacatalog.ihsn.org/catalog/2376
    Explore at:
    Dataset updated
    Jun 14, 2022
    Dataset authored and provided by
    TIMSS International Study Center
    Time period covered
    2007
    Area covered
    Australia
    Description

    Abstract

    TIMSS measures trends in mathematics and science achievement at the fourth and eighth grades in participating countries around the world, as well as monitoring curricular implementation and identifying promising instructional practices. Conducted on a regular 4-year cycle, TIMSS has assessed mathematics and science in 1995, 1999, 2003, and 2007, with planning underway for 2011. TIMSS collects a rich array of background information to provide comparative perspectives on trends in achievement in the context of different educational systems, school organizational approaches, and instructional practices. To support and promote secondary analyses aimed at improving mathematics and science education at the fourth and eighth grades, the TIMSS 2007 international database makes available to researchers, analysts, and other users the data collected and processed by the TIMSS project. This database comprises student achievement data as well as student, teacher, school, and curricular background data for 59 countries and 8 benchmarking participants. Across both grades, the database includes data from 433,785 students, 46,770 teachers, 14,753 school principals, and the National Research Coordinators of each country. All participating countries gave the IEA permission to release their national data.

    Geographic coverage

    The survey had national coverage

    Analysis unit

    Units of analysis in the study include documents, schools and individuals

    Universe

    The TIMSS target populations are all fourth and eighth graders in each participating country. The teachers in the TIMSS 2007 international database do not constitute representative samples of teachers in the participating countries. Rather, they are the teachers of nationally representative samples of students. Therefore, analyses with teacher data should be made with students as the units of analysis and reported in terms of students who are taught by teachers with a particular attribute. Teacher data are analyzed by linking the students to their teachers. The student-teacher linkage data files are used for this purpose.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The TIMSS target populations are all fourth and eighth graders in each participating country. To obtain accurate and representative samples, TIMSS used a two-stage sampling procedure whereby a random sample of schools is selected at the first stage and one or two intact fourth or eighth grade classes are sampled at the second stage. This is a very effective and efficient sampling approach, but the resulting student sample has a complex structure that must be taken into consideration when analyzing the data. In particular, sampling weights need to be applied and a re-sampling technique such as the jackknife employed to estimate sampling variances correctly.

    In addition, TIMSS 2007 uses Item Response Theory (IRT) scaling to summarize student achievement on the assessment and to provide accurate measures of trends from previous assessments. The TIMSS IRT scaling approach used multiple imputation-or "plausible values"-methodology to obtain proficiency scores in mathematics and science for all students. Each student record in the TIMSS 2007 international database contains imputed scores in mathematics and science overall, as well as for each of the content domain subscales and cognitive domain subscales. Because each imputed score is a prediction based on limited information, it almost certainly includes some small amount of error. To allow analysts to incorporate this error into analyses of the TIMSS achievement data, the TIMSS database provides five separate imputed scores for each scale. Each analysis should be replicated five times, using a different plausible value each time, and the results combined into a single result that includes information on standard errors that incorporate both sampling and imputation error.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The study used the following questionnaires: Fourth Grade Student Questionnaire, Fourth Grade Teacher Questionnaire, Fourth Grade School Questionnaire, Eighth Grade Student Questionnaire, Eighth Grade Mathematics Teacher Questionnaire, Eighth Grade Science Teacher Questionnaire, and Eighth Grade School Questionnaire. Information on the variables obtained or derived from questions in the survey is available in the TIMSS 2007 user guide for the international database: Data Supplement3: Variables derived from the Student, Teacher, and School Questionnaire data.

  7. d

    2013-14 Schools Offering Mathematics and Science Classes Estimations

    • catalog.data.gov
    Updated Sep 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for Civil Rights (OCR) (2023). 2013-14 Schools Offering Mathematics and Science Classes Estimations [Dataset]. https://catalog.data.gov/dataset/2013-14-schools-offering-mathematics-and-science-classes-estimations-0b2cb
    Explore at:
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    Office for Civil Rights (OCR)
    Description

    This Excel file contains data for schools offering classes in mathematics and science for all states. The file contains one spreadsheet for total schools.

  8. Named Math Formulas

    • kaggle.com
    • huggingface.co
    zip
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MarĂ­lia Prata (2023). Named Math Formulas [Dataset]. https://www.kaggle.com/datasets/mpwolke/cusersmarildownloadsdata-json/code
    Explore at:
    zip(19910 bytes)Available download formats
    Dataset updated
    Dec 30, 2023
    Authors
    MarĂ­lia Prata
    Description

    "Mathematical dataset based on 71 famous mathematical identities. Each entry consists of a name of the identity (name), a representation of that identity (formula), a label whether the representation belongs to the identity (label), and an id of the mathematical identity (formula_name_id). The false pairs are intentionally challenging, e.g., a^2+2^b=c^2as falsified version of the Pythagorean Theorem. All entries have been generated by using data.json as starting point and applying the randomizing and falsifying algorithms here. The formulas in the dataset are not just pure mathematical, but contain also textual descriptions of the mathematical identity. At most 400000 versions are generated per identity. There are ten times more falsified versions than true ones, such that the dataset can be used for a training with changing false examples every epoch."

    https://huggingface.co/datasets/ddrg/named_math_formulas

  9. o

    Course Enrolment in Grade 9 Math by Course Type

    • data.ontario.ca
    • open.canada.ca
    txt, xlsx
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education (2025). Course Enrolment in Grade 9 Math by Course Type [Dataset]. https://data.ontario.ca/dataset/course-enrolment-in-grade-9-math-by-course-type
    Explore at:
    xlsx(20645), txt(16331), xlsx(20550), txt(16568), xlsx(20248), xlsx(20359), xlsx(20293), txt(18246), xlsx(20807), txt(18666), xlsx(20179), txt(17883), txt(18258), txt(18260), txt(17942), xlsx(21154), xlsx(21960), xlsx(20018), txt(18151), xlsx(20500), xlsx(21783), txt(17830), txt(18840), xlsx(20772), xlsx(21108), txt(17956), txt(14794), txt(15242)Available download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Education
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Oct 23, 2025
    Area covered
    Ontario
    Description

    Public and Catholic board-level course enrolment in Grade 9 Math by course type (academic, applied and locally developed) for each academic year. School boards report this data using the Ontario School Information System (OnSIS).

    Includes:

    • academic year
    • board number
    • board name
    • course grade
    • course type
    • number of students
    • percentage of students

    Data excludes private schools, school authorities, publicly funded hospital and provincial schools, Education and Community Partnership Program (ECPP) facilities, summer, night and adult continuing education day schools.

    Enrolment totals include withdrawn and dropped courses.

    Students enrolled in more than one course are counted for each course.

    Cells are suppressed in categories with less than 10 students. Enrolment totals are rounded to the nearest five.

  10. m

    Data Related to the Rwanda Quality Basic Education for Human Capital...

    • data.mendeley.com
    Updated Aug 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    celine byukusenge (2023). Data Related to the Rwanda Quality Basic Education for Human Capital Development Project Impact Assessment: Upper primary and lower secondary Teachers’ performance and Pedagogical Beliefs in Mathematics and Science Cohort II [Dataset]. http://doi.org/10.17632/g36zrks68z.1
    Explore at:
    Dataset updated
    Aug 17, 2023
    Authors
    celine byukusenge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    The Rwanda Quality Basic Education for Human Capital Development (RQBEHCD) is a World Bank Group financed project through the government of Rwanda to support Mathematics and Science teachers from upper primary and lower secondary schools. The project was confirmed in 2019 and initiated in 2020. The dataset deposited here comprises two types of data; (1) teacher performance scores per subject taught [Math (for both primary and secondary school teachers), Physics, Chemistry, Biology taught in secondary, and Science and Elementary Technology (SET) taught in upper primary school], (2) teacher belief scores. The data were collected before and after a continuous profession development (CPD) training program of five months starting from March to July 2023. The training program comprised four channels that are ICT integration in teaching math and science, content knowledge (SCK), Math and Science laboratory activities, and innovative pedagogy. The data are collected from seven districts of Rwanda that were involved in the second cohort of training (2022-2023).

  11. Graduate in sciences, mathematics and technology by period. Spain and EU-28...

    • ine.es
    csv, html, json +4
    Updated Jun 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INE - Instituto Nacional de EstadĂ­stica (2017). Graduate in sciences, mathematics and technology by period. Spain and EU-28 (% with respect to the total of graduates of each sex) [Dataset]. https://www.ine.es/jaxiT3/Tabla.htm?t=12729&L=1
    Explore at:
    text/pc-axis, xls, csv, json, txt, html, xlsxAvailable download formats
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    National Statistics Institutehttp://www.ine.es/
    Authors
    INE - Instituto Nacional de EstadĂ­stica
    License

    https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal

    Time period covered
    Jan 1, 2008 - Jan 1, 2012
    Area covered
    Spain, European Union
    Variables measured
    Sex, Source, Spain and EU, Type of data, Educational Concept, Study Sector Groupings
    Description

    Women and Men in Spain: Graduate in sciences, mathematics and technology by period. Spain and EU-28 (% with respect to the total of graduates of each sex). Annual. National.

  12. Math Formula Retrieval

    • kaggle.com
    • huggingface.co
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Math Formula Retrieval [Dataset]. https://www.kaggle.com/datasets/thedevastator/math-formula-pair-classification-dataset/data
    Explore at:
    zip(2021716728 bytes)Available download formats
    Dataset updated
    Dec 2, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Math Formula Retrieval

    Math Formula Pair Classification Dataset

    By ddrg (From Huggingface) [source]

    About this dataset

    With a total of six columns, including formula1, formula2, label (binary format), formula1, formula2, and label, the dataset provides all the necessary information for conducting comprehensive analysis and evaluation.

    The train.csv file contains a subset of the dataset specifically curated for training purposes. It includes an extensive range of math formula pairs along with their corresponding labels and unique ID names. This allows researchers and data scientists to construct models that can predict whether two given formulas fall within the same category or not.

    On the other hand, test.csv serves as an evaluation set. It consists of additional pairs of math formulas accompanied by their respective labels and unique IDs. By evaluating model performance on this test set after training it on train.csv data, researchers can assess how well their models generalize to unseen instances.

    By leveraging this informative dataset, researchers can unlock new possibilities in mathematics-related fields such as pattern recognition algorithms development or enhancing educational tools that involve automatic identification and categorization tasks based on mathematical formulas

    How to use the dataset

    Introduction

    Dataset Description

    train.csv

    The train.csv file contains a set of labeled math formula pairs along with their corresponding labels and formula name IDs. It consists of the following columns: - formula1: The first mathematical formula in the pair (text). - formula2: The second mathematical formula in the pair (text). - label: The classification label indicating whether the pair of formulas belong to the same category or not (binary). A label value of 1 indicates that both formulas belong to the same category, while a label value of 0 indicates different categories.

    test.csv

    The purpose of the test.csv file is to provide a set of formula pairs along with their labels and formula name IDs for testing and evaluation purposes. It has an identical structure to train.csv, containing columns like formula1, formula2, label, etc.

    Task

    The main task using this dataset is binary classification, where your objective is to predict whether two mathematical formulas belong to the same category or not based on their textual representation. You can use various machine learning algorithms such as logistic regression, decision trees, random forests, or neural networks for training models on this dataset.

    Exploring & Analyzing Data

    Before building your model, it's crucial to explore and analyze your data. Here are some steps you can take:

    • Load both CSV files (train.csv and test.csv) into your preferred data analysis framework or programming language (e.g., Python with libraries like pandas).
    • Examine the dataset's structure, including the number of rows, columns, and data types.
    • Check for missing values in the dataset and handle them accordingly.
    • Visualize the distribution of labels to understand whether it is balanced or imbalanced.

    Model Building

    Once you have analyzed and preprocessed your dataset, you can start building your classification model using various machine learning algorithms:

    • Split your train.csv data into training and validation sets for model evaluation during training.
    • Choose a suitable

    Research Ideas

    • Math Formula Similarity: This dataset can be used to develop a model that classifies whether two mathematical formulas are similar or not. This can be useful in various applications such as plagiarism detection, identifying duplicate formulas in databases, or suggesting similar formulas based on user input.
    • Formula Categorization: The dataset can be used to train a model that categorizes mathematical formulas into different classes or categories. For example, the model can classify formulas into algebraic expressions, trigonometric equations, calculus problems, or geometric theorems. This categorization can help organize and search through large collections of mathematical formulas.
    • Formula Recommendation: Using this dataset, one could build a recommendation system that suggests related math formulas based on user input. By analyzing the similarities between different formula pairs and their corresponding labels, the system could provide recommendations for relevant mathematical concepts that users may need while solving problems or studying specific topics in mathematics

    Acknowle...

  13. h

    Supporting data for "The Use of Variation and Connections in Chinese...

    • datahub.hku.hk
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Xin (2024). Supporting data for "The Use of Variation and Connections in Chinese Mathematics Lessons" [Dataset]. http://doi.org/10.25442/hku.26830453.v1
    Explore at:
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    HKU Data Repository
    Authors
    Wei Xin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    The current study is dedicated to obtaining a more thorough understanding of the use of variation and connections in naturalistic mathematics teaching practices in China. The research object is the mathematics topic of functions in the senior secondary school curriculum, which requires approximately 8–16 lessons to fit the specific situations of different classes. The participants were six ordinary mathematics teachers in three locally renowned schools located in three different cities in China. Various data collection methods were applied in this research to identify more information on natural real-world teaching design regarding the use of variation and connections. First, observation with video recording was conducted in all lessons to capture more details and can be repeatedly viewed and examined. The essential information has been extracted and integrated, which can be found in the file "Video Note". Second, semi-structured interviews were conducted with teachers to gather their basic information, explore their intentions and reflections about lessons, and validate the ideas of the researcher. This information can be found in the file "Interview". Third, students' performances were also collected from tests, which can be found in the file "Test". The data of all types were categorized by teachers, i.e., the video set of lessons taught by each teacher, the interview of each teacher, and the overall test results of the class taught by each teacher. Therefore, there are usually six cases corresponding to six teachers in all files.

  14. o

    Data and program files associated with the publication: Effective Programs...

    • openicpsr.org
    delimited, zip
    Updated Jan 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Pellegrini; Cynthia Lake; Amanda Neitzel; Robert E. Slavin (2021). Data and program files associated with the publication: Effective Programs in Elementary Mathematics: A Meta-Analysis [Dataset]. http://doi.org/10.3886/E130284V2
    Explore at:
    zip, delimitedAvailable download formats
    Dataset updated
    Jan 4, 2021
    Dataset provided by
    University of Florence
    Johns Hopkins University
    Authors
    Marta Pellegrini; Cynthia Lake; Amanda Neitzel; Robert E. Slavin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data include information about 85 rigorous experimental studies that evaluated 64 programs in grades K-5 mathematics. These data were collected by the research team from studies included in a systematic review of programs for elementary mathematics. The data contain study and finding level information to examine what types of programs are most effective.

  15. Mathematical Formula Handwriting OCR Data

    • kaggle.com
    zip
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Mathematical Formula Handwriting OCR Data [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/mathematical-formula-handwriting-ocr-data
    Explore at:
    zip(11844082 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    5,156 Images - Mathematical Formula Handwriting OCR Data

    Description

    5,156 Images - Mathematical Formula Handwriting OCR Data. The writing envirenment includes A4 paper, square paper, lined paper, white board, etc. The data diversity includes multiple writing papers, multiple types of mathematical formulas, multiple photographic angles. The collecting angeles are looking up angleand eye-level angle. The dataset can be used for tasks such as mathematical formula handwriting OCR. For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/1323?source=Kaggle

    Data size

    5,156 images

    Collecting environment

    A4 paper, square paper, lined paper, white board, etc.

    Data diversity

    including multiple writing papers, multiple types of mathematical formulas, multiple photographic angles

    Device

    cellphone

    Photographic angle

    looking up angle, eye-level angle

    Data format

    the image data format is .jpg

    Annotation content

    different types of handwritten mathematical formula data were collected

    Accuracy rate

    according to the Collection content, the collecting accuracy is over 97%

    Licensing Information

    Commercial License

  16. Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...

    • datarade.ai
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4 million [Dataset]. https://datarade.ai/data-products/nexdata-korean-test-questions-structured-analysis-processin-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Korea (Republic of)
    Description

    Korean Test Questions Structured Analysis Processing Data, around 2.4 million questions, contains question types, questions, answers, explanations, etc..For subjects, include [Primary School] Korean, Mathematics, English, Social Studies, Science; [Middle School] Korean, English, Mathematics, Science, Social Studies; [High School] Korean, English, Mathematics, Physics, Chemistry, Biology, History, Geography; question Types indlude single-choice question, fill-in question, true or false question, short answer question, etc. This dataset can be used for large-scale subject knowledge enhancement tasks.

    Data content Korean K12, university test question

    Amount around 2.4 million questions

    Data fields Contains question types, questions, answers, explanations, etc.

    Subject and Grade Level K12, university;contains math,physics,chemistry,biology

    Question Types single-choice question, fill-in question, true or false question, short answer question, etc.

    Format Jsonl

    Language Korean

  17. File S1 - A Risk Assessment Model for Type 2 Diabetes in Chinese

    • plos.figshare.com
    doc
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Senlin Luo; Longfei Han; Ping Zeng; Feng Chen; Limin Pan; Shu Wang; Tiemei Zhang (2023). File S1 - A Risk Assessment Model for Type 2 Diabetes in Chinese [Dataset]. http://doi.org/10.1371/journal.pone.0104046.s001
    Explore at:
    docAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Senlin Luo; Longfei Han; Ping Zeng; Feng Chen; Limin Pan; Shu Wang; Tiemei Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplemental Material. File S1 contains seven tables and one figure. They are: (1) Table S1 the optimal number of fastclus; (2) Table S2 Hosmer and Lemeshow test for three logistic regressions; (3) Table S3 frequency of selected variables occurrence in all decision trees; (4) Table S4 risk factors and beta coefficient derived from multivariate logistic regression; (5) S5 the characteristics of different clusters (mean±SD); (6) Table S6 the results of jackknife cross-validation in model population; (7) Table S7 the list of 96 variables in risk variable selection; (8) Figure S1 receiver operating characteristic curve of RR. (DOC)

  18. k

    Data from: Numerical experiments to "Error Analysis of Exponential...

    • radar.kit.edu
    • service.tib.eu
    • +1more
    tar
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Dörich (2023). Numerical experiments to "Error Analysis of Exponential Integrators for Nonlinear Wave-Type Equations" [Dataset]. http://doi.org/10.35097/1284
    Explore at:
    tar(96256 bytes)Available download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Karlsruhe Institute of Technology
    Authors
    Benjamin Dörich
    Description

    This code has been used for the numerical experiments in the thesis "Error Analysis of Exponential Integrators for Nonlinear Wave-Type Equations" by Benjamin Dörich, see https://www.doi.org/10.5445/IR/1000130187.

  19. d

    Compendium – LBOI section 3: Education

    • digital.nhs.uk
    xls
    Updated May 23, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Compendium – LBOI section 3: Education [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/compendium-local-basket-of-inequality-indicators-lboi/current/section-3-education
    Explore at:
    xls(332.3 kB)Available download formats
    Dataset updated
    May 23, 2013
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Apr 1, 1999 - Dec 31, 2012
    Area covered
    England
    Description

    The percentage of all pupils who returned valid KS2 test results who achieved level 4 or above in KS2 Maths and, separately, KS2 English. Please note that this data has also been stratified by ethnicity and eligibility for free school meals. Education plays a number of roles in influencing inequalities in health, if health is viewed in its widest sense. Firstly, it has an important role in influencing inequalities in socioeconomic position. Educational qualifications are a determinant of an individuals labour market position, which in turn influences income, housing and other material resources. These are related to health and health inequalities. As a consequence, education is a traditional route out of poverty for those living in disadvantage. The roles of education set out above imply a range of outcomes which are not readily measurable. However, inequality is observed when looking at educational achievement. Children from disadvantaged backgrounds, as measured by being in receipt of free school meals, have lower educational achievement than other children. This indicator relates to the Public Service Agreement (PSA) performance management framework 2008-2011, as follows:• PSA Delivery Agreement 10 Indicator 2 Increase the proportion achieving Level 4 in both English and Maths at KS2 to 78% by 2011 (baseline 2007 of 71%);• PSA Delivery Agreement 11 Indicator 2 Achievement gap between pupils eligible for free school meals and their peers achieving the expected level at KS2 and KS4 (national target not specified, baseline 2006 of 24 percentage points at KS2 and 28 percentage points at KS4). The National Curriculum standards have been designed so that most pupils will progress approximately one level every two years. This means that by the end of KS2, pupils are expected to achieve level 4. Previously, target levels of attainment referred to English and Maths separately, however these are now being targeted together although statistics continue to be released by each subject separately. Legacy unique identifier: P01089

  20. u

    OER and Mathematics Skills 2014-2015 - Chile

    • datafirst.uct.ac.za
    Updated Jul 18, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research on Open Educational Resources for Development (ROER4D) (2016). OER and Mathematics Skills 2014-2015 - Chile [Dataset]. https://www.datafirst.uct.ac.za/dataportal/index.php/catalog/576
    Explore at:
    Dataset updated
    Jul 18, 2016
    Dataset authored and provided by
    Research on Open Educational Resources for Development (ROER4D)
    Time period covered
    2014 - 2015
    Area covered
    Chile
    Description

    Abstract

    This study examines the effect of the use of two Open Educational Resources (OER) (a Khan Academy online tutorial and an open textbook hosted on Wikibooks) on logical-mathematical outcomes for first and second-year students in higher education institutions in Chile. It also investigates perceptions of instructors and students about the use of OER, in order to understand how these resources are used and valued. Quantitative and qualitative methods were used to collect student performance data via a student survey, student focus groups, interviews with instructors, and sourcing institutional records.

    Only the institutional records, focus group data and interview data are included in the final dataset. Student survey data is not made available for confidentiality reasons. Findings indicate that students in a contact-study mathematics course who used a Khan Academy online mathematics tutorial obtained better examination results than students who did not use any additional resources, or those who used the open textbook. Moreover, it was also found that instructors and students have positive perceptions about the use of Khan Academy and Wikibooks materials.This study is Sub-project 9 of the Research on Open Educational Resources for Development (ROER4D) project, hosted by the Centre for Innovation in Learning and Teaching (CILT) at the University of Cape Town, South Africa, and Wawasan Open University, Malaysia.

    Geographic coverage

    The interviews and survey data were conducted at one institution in Chile and are not representative of the country as a whole.

    Analysis unit

    Individuals

    Universe

    The survey covered students and instructors in the single institution involved in the study.

    Kind of data

    Focus group and survey data

    Mode of data collection

    Face-to-face and internet [f2f-int]

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department for Education (2019). English and maths [Dataset]. https://www.gov.uk/government/statistical-data-sets/fe-data-library-skills-for-life
Organization logo

Data from: English and maths

Related Article
Explore at:
Dataset updated
Nov 28, 2019
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Education
Description

English and maths (formerly Skills for Life) qualifications are designed to give people the reading, writing, maths and communication skills they need in everyday life, to operate effectively in work and to help them succeed on other training courses.

These data provide information on participation and achievements for English and maths qualifications and are broken down into a number of key reports.

Can’t find what you’re looking for?

If you need help finding data please refer to the table finder tool to search for specific breakdowns available for FE statistics.

Current data

https://assets.publishing.service.gov.uk/media/5f0c5c923a6f4003935c2c6f/201819-Nov_EandM_Part_and_Achieve.xlsx">English and maths data tool for participation and achievements 2018/19

 <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">10.9 MB</span></p>




 <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
 <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">

Request an accessible format.

  If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternative.formats@education.gov.uk" target="_blank" class="govuk-link">alternative.formats@education.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.

Archive

Search
Clear search
Close search
Google apps
Main menu