17 datasets found
  1. h

    Data from: MathCheck

    • huggingface.co
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PremiLab-Math (2024). MathCheck [Dataset]. https://huggingface.co/datasets/PremiLab-Math/MathCheck
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    PremiLab-Math
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a substantial risk of model overfitting and fails to accurately represent genuine mathematical… See the full description on the dataset page: https://huggingface.co/datasets/PremiLab-Math/MathCheck.

  2. MetaMath QA

    • kaggle.com
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  3. Data from: The IBEM Dataset: a large printed scientific image dataset for...

    • zenodo.org
    zip
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Anitei; Dan Anitei; Joan Andreu Sánchez; Joan Andreu Sánchez; José Miguel Benedí; José Miguel Benedí (2023). The IBEM Dataset: a large printed scientific image dataset for indexing and searching mathematical expressions [Dataset]. http://doi.org/10.5281/zenodo.7963703
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dan Anitei; Dan Anitei; Joan Andreu Sánchez; Joan Andreu Sánchez; José Miguel Benedí; José Miguel Benedí
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The IBEM dataset consists of 600 documents with a total number of 8272 pages, containing 29603 isolated and 137089 embedded Mathematical Expressions (MEs). The objective of the IBEM dataset is to facilitate the indexing and searching of MEs in massive collections of STEM documents. The dataset was built by parsing the LaTeX source files of documents from the KDD Cup Collection. Several experiments can be carried out with the IBEM dataset ground-truth (GT): ME detection and extraction, ME recognition, etc.

    The dataset consists of the following files:

    • “IBEM.json”: file containing the IBEM GT information. The data is firstly organized by pages, then by the type of expression (“embedded” or “displayed”), and lastly by the GT of each individual ME. For each ME we provide:
      • xy page-level coordinates, reported as relative (%) to the width/height of the page image.
      • “split” attribute indicating the number of fragments in which the ME has been split. MEs can be split over various lines, columns or pages. The LaTeX transcript of split MEs have been exactly replicated (entire LaTeX definition) for each fragment.
      • “latex” original transcript as extracted from the LaTeX source files of the documents. This definition can contain user-defined macros. In order to be able to compile these expressions, each page includes the preamble of the source files containing the defined macros and the packages used by the authors of the documents.
      • “latex_expand” transcript reconstructed from the output stream of the LuaLaTeX engine in which user-defined macros have been expanded. The transcript has the same visual representation as the original transcript, with the addition that the LaTeX definitions are tokenized, the order of sub/super script elements have been fixed, and matrices have been transformed to arrays.
      • “latex_norm” transcript resulting from applying an extra normalization process to the “latex_expand” expression. This normalization process includes removing font information such as slant, style, and weight.
    • “partitions/*.lst”: files containing list of pages forming the partition sets.
    • “pages/*.jpg”: individual pages extracted from the documents.

    The dataset is partitioned into various sets as provided for the ICDAR 2021 Competition on Mathematical Formula Detection. The ground-truth related to this competition, which is included in this dataset version, can also be found here. More information about the competition can be found in the following paper:

    D. Anitei, J.A. Sánchez, J.M. Fuentes, R. Paredes, and J.M. Benedí. ICDAR 2021 Competition on Mathematical Formula Detection. In ICDAR, pages 783–795, 2021.

    For ME recognition tasks, we recommend rendering the “latex_expand” version of the formulae in order to create standalone expressions that have the same visual representation as MEs found in the original documents (see attached python script “extract_GT.py”). Extracting MEs from the documents based on coordinates is more complex, as special care is needed to concatenate the fragments of split expressions. Baseline results for ME recognition tasks will soon be made available.

  4. D

    Comparative Judgement of Statements About Mathematical Definitions

    • dataverse.no
    • dataverse.azure.uit.no
    csv, txt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
    Explore at:
    txt(3623), csv(2523), csv(37503), csv(43566)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.

  5. P

    NaturalProofs Dataset

    • paperswithcode.com
    • opendatalab.com
    • +2more
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho (2025). NaturalProofs Dataset [Dataset]. https://paperswithcode.com/dataset/naturalproofs
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho
    Description

    The NaturalProofs Dataset is a large-scale dataset for studying mathematical reasoning in natural language. NaturalProofs consists of roughly 20,000 theorem statements and proofs, 12,500 definitions, and 1,000 additional pages (e.g. axioms, corollaries) derived from ProofWiki, an online compendium of mathematical proofs written by a community of contributors.

  6. f

    MaxTex: A Large Scale Benchmark Dataset for Mathematical Formula Recognition...

    • figshare.com
    application/x-rar
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Runqing Yan (2025). MaxTex: A Large Scale Benchmark Dataset for Mathematical Formula Recognition [Dataset]. http://doi.org/10.6084/m9.figshare.27321012.v2
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    figshare
    Authors
    Runqing Yan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mathematical formula recognition is an important component of document understanding and has broad application value in academic literature processing and intelligent education However, existing research mainly focuses on improving the model architecture to enhance the recognition performance of relatively simple formulas, ignoring the limitations of existing benchmark datasets in terms of scale, quality, and diversity, which limits the development of complex formula recognition technology This article has made two key contributions. Firstly, high-quality printed MaxTex (P) and handwritten scanned MaxTex (H) datasets have been constructed MaxTex (P) contains 223000 samples and avoids symbol redundancy by adopting a unified and efficient morpheme design; Although MaxTex (H) has a moderate scale, it optimizes the morpheme space and covers complex mathematical expressions At the same time, these two datasets have been strictly controlled in terms of sample size, data quality, and annotation accuracy, providing a more reliable benchmark for model training and evaluation Secondly, an innovative character sequence encoding and decoding scheme was designed to solve the problems of missing spaces in existing LaTeX label sequences and dictionary inflation caused by BPE encoding and decoding, while preserving the semantic information of the original character sequenceTo obtain MaxTex (H), please contact us gxqyrq@gmail.com

  7. Z

    SCG Dataset from Graph Neural Networks in Supply Chain Analytics and...

    • data.niaid.nih.gov
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bappy, Mahathir Mohammad (2024). SCG Dataset from Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13652825
    Explore at:
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    Wasi, Azmine Toushik
    Akib, Adipto Raihan
    Bappy, Mahathir Mohammad
    Islam, MD Shafikul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: Graph Neural Networks (GNNs) have recently gained traction in transportation, bioinformatics, language and image processing, but research on their application to supply chain management remains limited. Supply chains are inherently graph-like, making them ideal for GNN methodologies, which can optimize and solve complex problems. The barriers include a lack of proper conceptual foundations, familiarity with graph applications in SCM, and real-world benchmark datasets for GNN-based supply chain research. To address this, we discuss and connect supply chains with graph structures for effective GNN application, providing detailed formulations, examples, mathematical definitions, and task guidelines. Additionally, we present a multi-perspective real-world benchmark dataset from a leading FMCG company in Bangladesh, focusing on supply chain planning. We discuss various supply chain tasks using GNNs and benchmark several state-of-the-art models on homogeneous and heterogeneous graphs across six supply chain analytics tasks. Our analysis shows that GNN-based models consistently outperform statistical ML and other deep learning models by around 10-30% in regression, 10-30% in classification and detection tasks, and 15-40% in anomaly detection tasks on designated metrics. With this work, we lay the groundwork for solving supply chain problems using GNNs, supported by conceptual discussions, methodological insights, and a comprehensive dataset.

  8. m

    Data from: Dataset of Student Level Prediction in UAE

    • data.mendeley.com
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
    Explore at:
    Dataset updated
    Dec 18, 2020
    Authors
    shatha Ghareeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Arab Emirates
    Description

    The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.

  9. P

    ASDiv Dataset

    • paperswithcode.com
    Updated Aug 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shen-yun Miao; Chao-Chun Liang; Keh-Yih Su (2024). ASDiv Dataset [Dataset]. https://paperswithcode.com/dataset/asdiv
    Explore at:
    Dataset updated
    Aug 14, 2024
    Authors
    Shen-yun Miao; Chao-Chun Liang; Keh-Yih Su
    Description

    We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school. Each MWP is annotated with its problem type and grade level (for indicating the level of difficulty). Furthermore, we propose a metric to measure the lexicon usage diversity of a given MWP corpus, and demonstrate that ASDiv is more diverse than existing corpora. Experiments show that our proposed corpus reflects the true capability of MWP solvers more faithfully.

  10. P

    MML Dataset

    • paperswithcode.com
    Updated Jan 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt (2025). MML Dataset [Dataset]. https://paperswithcode.com/dataset/mmlu
    Explore at:
    Dataset updated
    Jan 5, 2025
    Authors
    Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt
    Description

    MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.

  11. StudentMathScores

    • kaggle.com
    Updated Jun 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Logan Henslee (2019). StudentMathScores [Dataset]. https://www.kaggle.com/loganhenslee/studentmathscores/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Logan Henslee
    Description

    CONTEXT

    Practice Scenario: The UIW School of Engineering wants to recruit more students into their program. They will recruit students with great math scores. Also, to increase the chances of recruitment,​ the department will look for students who qualify for financial aid. Students who qualify for financial aid more than likely come from low socio-economic backgrounds. One way to indicate this is to view how much federal revenue a school district receives through its state. High federal revenue for a school indicates that a large portion of the student base comes from low incomes families.

    The question we wish to ask is as follows: Name the school districts across the nation where their Child Nutrition Programs(c25) are federally funded between the amounts $30,000 and $50,000. And where the average math score for the school districts corresponding state is greater than or equal to the nations average score of 282.

    The SQL query below in 'Top5MathTarget.sql' can be used to answer this question in MySQL. To execute this process, one would need to install MySQL to their local system and load the attached datasets below from Kaggle into their MySQL schema. The SQL query below will then join the separate tables on various key identifiers.

    DATA SOURCE Data is sourced from The U.S Census Bureau and The Nations Report Card (using the NAEP Data Explorer).

    Finance: https://www.census.gov/programs-surveys/school-finances/data/tables.html

    Math Scores: https://www.nationsreportcard.gov/ndecore/xplore/NDE

    COLUMN NOTES

    All data comes from the school year 2017. Individual schools are not represented, only school districts within each state.

    FEDERAL FINANCE DATA DEFINITIONS

    t_fed_rev: Total federal revenue through the state to each school district.

    C14- Federal revenue through the state- Title 1 (no child left behind act).

    C25- Federal revenue through the state- Child Nutrition Act.

    Title 1 is a program implemented in schools to help raise academic achievement ​for all students. The program is available to schools where at least 40% of the students come from low inccom​​e families.

    Child Nutrition Programs ensure the children are getting the food they need to grow and learn. Schools with high federal revenue to these programs indicate students that also come from low income​ families.

    MATH SCORES DATA DEFINITIONS

    Note: Mathematics, Grade 8, 2017, All Students (Total)

    average_scale_score - The state's average score for eighth graders taking the NAEP math exam.

  12. P

    GPQA Dataset

    • paperswithcode.com
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Rein; Betty Li Hou; Asa Cooper Stickland; Jackson Petty; Richard Yuanzhe Pang; Julien Dirani; Julian Michael; Samuel R. Bowman (2024). GPQA Dataset [Dataset]. https://paperswithcode.com/dataset/gpqa
    Explore at:
    Dataset updated
    Apr 13, 2024
    Authors
    David Rein; Betty Li Hou; Asa Cooper Stickland; Jackson Petty; Richard Yuanzhe Pang; Julien Dirani; Julian Michael; Samuel R. Bowman
    Description

    GPQA stands for Graduate-Level Google-Proof Q&A Benchmark. It's a challenging dataset designed to evaluate the capabilities of Large Language Models (LLMs) and scalable oversight mechanisms. Let me provide more details about it:

    Description: GPQA consists of 448 multiple-choice questions meticulously crafted by domain experts in biology, physics, and chemistry. These questions are intentionally designed to be high-quality and extremely difficult. Expert Accuracy: Even experts who hold or are pursuing PhDs in the corresponding domains achieve only 65% accuracy on these questions (or 74% when excluding clear mistakes identified in retrospect). Google-Proof: The questions are "Google-proof," meaning that even with unrestricted access to the web, highly skilled non-expert validators only reach an accuracy of 34% despite spending over 30 minutes searching for answers. AI Systems Difficulty: State-of-the-art AI systems, including our strongest GPT-4 based baseline, achieve only 39% accuracy on this challenging dataset.

    The difficulty of GPQA for both skilled non-experts and cutting-edge AI systems makes it an excellent resource for conducting realistic scalable oversight experiments. These experiments aim to explore ways for human experts to reliably obtain truthful information from AI systems that surpass human capabilities¹³.

    In summary, GPQA serves as a valuable benchmark for assessing the robustness and limitations of language models, especially when faced with complex and nuanced questions. Its difficulty level encourages research into effective oversight methods, bridging the gap between AI and human expertise.

    (1) [2311.12022] GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv.org. https://arxiv.org/abs/2311.12022. (2) GPQA: A Graduate-Level Google-Proof Q&A Benchmark — Klu. https://klu.ai/glossary/gpqa-eval. (3) GPA Dataset (Spring 2010 through Spring 2020) - Data Science Discovery. https://discovery.cs.illinois.edu/dataset/gpa/. (4) GPQA: A Graduate-Level Google-Proof Q&A Benchmark - GitHub. https://github.com/idavidrein/gpqa. (5) Data Sets - OpenIntro. https://www.openintro.org/data/index.php?data=satgpa. (6) undefined. https://doi.org/10.48550/arXiv.2311.12022. (7) undefined. https://arxiv.org/abs/2311.12022%29.

  13. W

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old...

    • cloud.csiss.gmu.edu
    • data.wu.ac.at
    csv
    Updated Jan 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2020). % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old Best Entry definition) - (Snapshot) [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/kpi-75
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 3, 2020
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old Best Entry definition) - (Snapshot)

    *This indicator was discontinued in 2014 due to the national changes in GCSEs.

  14. f

    Data Sheet 1_Mathematical methodology for defining a frequent attender...

    • frontiersin.figshare.com
    pdf
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Williams; Syaribah N. Brice; Dave Price (2025). Data Sheet 1_Mathematical methodology for defining a frequent attender within emergency departments.pdf [Dataset]. http://doi.org/10.3389/femer.2025.1462764.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    Frontiers
    Authors
    Elizabeth Williams; Syaribah N. Brice; Dave Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveEmergency department (ED) frequent attenders (FA) have been the subject of discussion in many countries. This group of patients have contributed to the high expenses of health services and strained capacity in the department. Studies related to ED FAs aim to describe the characteristics of patients such as demographic and socioeconomic factors. The analysis may explore the relationship between these factors and multiple patient visits. However, the definition used for classifying patients varies across studies. While most studies used frequency of attendance to define the FA, the derivation of the frequency is not clear.MethodsWe propose a mathematical methodology to define the time interval between ED returns for classifying FAs. K-means clustering and the Elbow method were used to identify suitable FA definitions. Recursive clustering on the smallest time interval cluster created a new, smaller cluster and formal FA definition.ResultsApplied to a case study dataset of approximately 336,000 ED attendances, this framework can consistently and effectively identify FAs across EDs. Based on our data, a FA is defined as a patient with three or more attendances within sequential 21-day periods.ConclusionThis study introduces a standardized framework for defining ED FAs, providing a consistent and effective means of identification across different EDs. Furthermore, the methodology can be used to identify patients who are at risk of becoming a FA. This allows for the implementation of targeted interventions aimed at reducing the number of future attendances.

  15. Comparing Burr, log-normal and Weibull models with the gamma model on...

    • plos.figshare.com
    xls
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nyall Jamieson; Christiana Charalambous; David M. Schultz; Ian Hall (2025). Comparing Burr, log-normal and Weibull models with the gamma model on anthrax, salmonellosis and campylobacteriosis datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012041.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nyall Jamieson; Christiana Charalambous; David M. Schultz; Ian Hall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For brevity, we define the datasets as A, S or C to represent anthrax, salmonellosis and campylobacteriosis datasets respectively. The numbers in the dataset column indicate which dataset for a given disease is being referred to, as we have multiple incubation-period datasets for Salmonella and Campylobacter. The value provided is the difference between the recorded AIC between a given model and the gamma distribution. Negative values indicate lower AIC, which is preferable. Similarly, positive values indicate higher AIC, which implies a worse model fit.

  16. f

    Do Humans Optimally Exploit Redundancy to Control Step Variability in...

    • figshare.com
    • data.niaid.nih.gov
    • +1more
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan B. Dingwell; Joby John; Joseph P. Cusumano (2023). Do Humans Optimally Exploit Redundancy to Control Step Variability in Walking? [Dataset]. http://doi.org/10.1371/journal.pcbi.1000856
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Jonathan B. Dingwell; Joby John; Joseph P. Cusumano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is widely accepted that humans and animals minimize energetic cost while walking. While such principles predict average behavior, they do not explain the variability observed in walking. For robust performance, walking movements must adapt at each step, not just on average. Here, we propose an analytical framework that reconciles issues of optimality, redundancy, and stochasticity. For human treadmill walking, we defined a goal function to formulate a precise mathematical definition of one possible control strategy: maintain constant speed at each stride. We recorded stride times and stride lengths from healthy subjects walking at five speeds. The specified goal function yielded a decomposition of stride-to-stride variations into new gait variables explicitly related to achieving the hypothesized strategy. Subjects exhibited greatly decreased variability for goal-relevant gait fluctuations directly related to achieving this strategy, but far greater variability for goal-irrelevant fluctuations. More importantly, humans immediately corrected goal-relevant deviations at each successive stride, while allowing goal-irrelevant deviations to persist across multiple strides. To demonstrate that this was not the only strategy people could have used to successfully accomplish the task, we created three surrogate data sets. Each tested a specific alternative hypothesis that subjects used a different strategy that made no reference to the hypothesized goal function. Humans did not adopt any of these viable alternative strategies. Finally, we developed a sequence of stochastic control models of stride-to-stride variability for walking, based on the Minimum Intervention Principle. We demonstrate that healthy humans are not precisely “optimal,” but instead consistently slightly over-correct small deviations in walking speed at each stride. Our results reveal a new governing principle for regulating stride-to-stride fluctuations in human walking that acts independently of, but in parallel with, minimizing energetic cost. Thus, humans exploit task redundancies to achieve robust control while minimizing effort and allowing potentially beneficial motor variability.

  17. f

    Data Sheet 1_Fourier-mixed window attention for efficient and robust long...

    • frontiersin.figshare.com
    pdf
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nhat Thanh Tran; Jack Xin (2025). Data Sheet 1_Fourier-mixed window attention for efficient and robust long sequence time-series forecasting.pdf [Dataset]. http://doi.org/10.3389/fams.2025.1600136.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Frontiers
    Authors
    Nhat Thanh Tran; Jack Xin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting (LSTF) in a robust manner. While window attention being local is a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Experiments on univariate and multivariate datasets show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 1.6 to 2 times. On strongly non-stationary data (power grid and dengue disease data), FWin outperforms Informer and recent SOTAs thereby demonstrating its superior robustness. We give mathematical definition of FWin attention, and prove its equivalency to the canonical full attention under the block diagonal invertibility (BDI) condition of the attention matrix. The BDI is verified to hold with high probability on benchmark datasets experimentally.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PremiLab-Math (2024). MathCheck [Dataset]. https://huggingface.co/datasets/PremiLab-Math/MathCheck

Data from: MathCheck

PremiLab-Math/MathCheck

Related Article
Explore at:
102 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2024
Dataset authored and provided by
PremiLab-Math
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a substantial risk of model overfitting and fails to accurately represent genuine mathematical… See the full description on the dataset page: https://huggingface.co/datasets/PremiLab-Math/MathCheck.

Search
Clear search
Close search
Google apps
Main menu