56 datasets found
  1. GSM8K - Grade School Math 8K Q&A

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
    Explore at:
    zip(3418660 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GSM8K - Grade School Math 8K Q&A

    A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

    By Huggingface Hub [source]

    About this dataset

    This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

    The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

    To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

    Research Ideas

    • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
    • Generating new grade school math questions and answers using g...
  2. MetaMath QA

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
    Explore at:
    zip(78629842 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  3. Data from: Linguistic Appropriation and Meaning in Mathematical Modeling...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bárbara Nivalda Palharini Alvim Sousa; Lourdes Maria Werle de Almeida (2023). Linguistic Appropriation and Meaning in Mathematical Modeling Activities [Dataset]. http://doi.org/10.6084/m9.figshare.11314559.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Bárbara Nivalda Palharini Alvim Sousa; Lourdes Maria Werle de Almeida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract In this paper we turn our attention to the different language games associated to the development of Mathematical Modelling activities and to the meanings constituted by students within these language games in relation to the first order ordinary differential equations. The research is based on Mathematical Modelling in Mathematics Education and has as its philosophical basis the studies of Ludwig Wittgenstein and some of his interpreters. Considering these theoretical-philosophical elements, mathematical modelling activities were developed in a Mathematics Degree in a course of Ordinary Differential Equations. Data were collected through written records, audio and video recordings, questionnaires, and interviews. The data analysis methodology considers the students' discursive practices and allowed us to construct trees of idea association. The results indicate that the constitution of meaning within modelling activities is associated to the students' linguistic appropriation of the rules and techniques that are configured in specific language games identified in the Mathematical Modelling activities.

  4. Data from: Meaning of derivative in the book tasks of 1st of “Bachillerato”

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo (2023). Meaning of derivative in the book tasks of 1st of “Bachillerato” [Dataset]. http://doi.org/10.6084/m9.figshare.14304760.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Due to the importance of textbooks within the processes of teaching and learning in Mathematics, this article focuses on the tasks proposed in five textbooks of 1st of Bachillerato for this topic. The goal is to identify meanings of derivative in the textbooks through the proposed tasks. It is a quantitative research in which, by means of a cluster analysis, the tasks were grouped according to similarity. The results show that the books emphasize three meanings of the derivative: one procedural-algebraic, one algorithmic, and finally another conceptual-geometric meaning, all of them dominated by the symbolic representation system and that exclusively show a mathematical context.

  5. Data from: Number and magnitude: discussing the concept of measurement by...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fumikazu Saito (2023). Number and magnitude: discussing the concept of measurement by means of a sixteenth-century mathematical instrument [Dataset]. http://doi.org/10.6084/m9.figshare.5720041.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Fumikazu Saito
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: This paper presents some results of a study that valued manipulative conditions in the process of knowledge construction by handling a sixteenth-century mathematical instrument. The study was based on a problem-situation elaborated by epistemological and mathematical questions, which emerged from an interface built between the history of mathematics and teaching. The handling of this instrument triggered a series of actions that led teachers to reflect and discuss the very notion of magnitude, number and measurement. The results of the study suggest an epistemological gap between the observer who measures, the instrument that mediates the measuring, and the measured object. This gap compromises the proper understanding of measuring and the relationship between number and magnitude in measurement process.

  6. t

    BIOGRID CURATED DATA FOR MATH-4 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-4 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38920/table/caenorhabditis-elegans/math-4.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-4 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-4

  7. Data from: Meanwhile, in a Liquid-Modern Consumer Society: The Production of...

    • scielo.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Aurélio Kistemann Jr.; Romulo Campos Lins (2023). Meanwhile, in a Liquid-Modern Consumer Society: The Production of Meanings and Decision Making of Consumers [Dataset]. http://doi.org/10.6084/m9.figshare.19985488.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Marco Aurélio Kistemann Jr.; Romulo Campos Lins
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces the paths and results of a qualitative research, concerning the production of meanings and decision-making of consumers in a liquid-modern consumer society. We based the study on theoretical assumptions of Criticism Mathematics Education, The Model of Semantic Fields and Economics, which was investigated by means of semi-structured interviews as consumers behave and make decisions when faced with consumption situations. We also observed how they use mathematics in their decisions, enabling in the society that was referred to other paths and reflections on consumption trail. The investigation revealed, among other results presented throughout the article that, independent of the school training consumers tend to use only the four operations for decision-making, in addition to using the parcel value to the interest rates detriment, to make their decisions. Finally, we highlight that the mathematical and financial simulations use can guide decision-making or shopping insight.

  8. Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading,...

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +1more
    Updated Oct 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2022). Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading, science and mathematics assessment [Dataset]. http://doi.org/10.25318/3710022901-eng
    Explore at:
    Dataset updated
    Oct 18, 2022
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Government of Canadahttp://www.gg.ca/
    Area covered
    Canada
    Description

    Reading, science and math mean scores from the Pan-Canadian Assessment Program (PCAP), by province.

  9. HASYv2 - Symbol Recognizer

    • kaggle.com
    zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fedesoriano (2021). HASYv2 - Symbol Recognizer [Dataset]. https://www.kaggle.com/fedesoriano/hasyv2-symbol-recognizer
    Explore at:
    zip(85506565 bytes)Available download formats
    Dataset updated
    Oct 11, 2021
    Authors
    fedesoriano
    Description

    Context

    Publicly available datasets have helped the computer vision community to compare new algorithms and develop applications. Especially MNIST [LBBH98] was used thousands of times to train and evaluate models for classification. However, even rather simple models consistently get about 99.2 % accuracy on MNIST [TF-16a]. The best models classify everything except for about 20 instances correct. This makes meaningful statements about improvements in classifiers hard. A possible reason why current models are so good on MNIST are 1) MNIST has only 10 classes 2) there are very few (probably none) labelling errors in MNIST 3) every class has 6000 training samples 4) the feature dimensionality is comparatively low. Also, applications that need to recognize only Arabic numerals are rare. Similar to MNIST, HASY is of very low resolution. In contrast to MNIST, the HASYv2 dataset contains 369 classes, including Arabic numerals and Latin characters. Furthermore, HASYv2 has much fewer recordings per class than MNIST and is only in black and white whereas MNIST is in grayscale. HASY could be used to train models for semantic segmentation of non-cursive handwritten documents like mathematical notes or forms.

    Content

    The dataset contains the following:

    • a pickle file: HASYv2
    • a txt file: cite.txt

    The pickle file contains the 168233 observations in a dictionary form. The simplest way to use the HASYv2 dataset is to download the pickle file below (HASYv2). You can use the following lines of code to load the data:

    def unpickle(file):
      import pickle
      with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
      return dict
    

    HASYv2 = unpickle("HASYv2")

    The data comes in a dictionary format, you can get the data and the labels separately by extracting the content from the dictionary: data = HASYv2['data'] labels = HASYv2['labels'] symbols = HASYv2['latex_symbol'] Note that the shape of the data is directly (32 x 32 x 3 x 168233), with the first and second dimensions as the height and width respectively, the third dimension correspond to the channels and the fourth to the observation number.

    Citation

    fedesoriano. (October 2021). HASYv2 - Symbol Recognizer. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/hasyv2-symbol-recognizer.

    Source

    The dataset was originally uploaded by Martin Thoma, see https://arxiv.org/abs/1701.08380.

    Thoma, M. (2017). The HASYv2 dataset. ArXiv, abs/1701.08380.

    The original paper describes the HASYv2 dataset. HASY is a publicly available, free of charge dataset of single symbols similar to MNIST. It contains 168233 instances of 369 classes. HASY contains two challenges: A classification challenge with 10 pre-defined folds for 10-fold cross-validation and a verification challenge. The HASYv2 dataset (PDF Download Available). Available from: https://arxiv.org/pdf/1701.08380.pdf [accessed Oct 11, 2021].

  10. t

    BIOGRID CURATED DATA FOR MATH-34 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-34 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38858/table/caenorhabditis-elegans/math-34.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-34 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-34

  11. y

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new...

    • data.yorkopendata.org
    • ckan.york.staging.datopian.com
    • +3more
    Updated Mar 18, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new First Entry definition) - (Snapshot) [Dataset]. https://data.yorkopendata.org/dataset/kpi-75a
    Explore at:
    Dataset updated
    Mar 18, 2015
    License

    Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
    License information was derived automatically

    Description

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (new First Entry definition) - (Snapshot) *This indicator has been discontinued due to national changes in GCSEs in 2016.

  12. Hex Dictionary V2

    • kaggle.com
    zip
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DigitalEuan (2025). Hex Dictionary V2 [Dataset]. https://www.kaggle.com/datasets/digitaleuan/hex-dictionary-v2
    Explore at:
    zip(203686 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    DigitalEuan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    READ ME

    Welcome to the Universal Binary Principle (UBP) Dictionary System - Version 2

    Author: Euan Craig, New Zealand 2025

    Embark on a revolutionary journey with Version 2 of the UBP Dictionary System, a cutting-edge Python notebook that redefines how words are stored, analyzed, and visualized! Built for Kaggle, this system encodes words as multidimensional hexagonal structures in custom .hexubp files, leveraging sophisticated mathematics to integrate binary toggles, resonance frequencies, spatial coordinates, and more, all rooted in the Universal Binary Principle (UBP). This is not just a dictionary—it’s a paradigm shift in linguistic representation.

    What is the UBP Dictionary System? The UBP Dictionary System transforms words into rich, vectorized representations stored in custom .hexubp files—a JSON-based format designed to encapsulate a word’s multidimensional UBP properties. Each .hexubp file represents a word as a hexagonal structure with 12 vertices, encoding: * Binary Toggles: 6-bit patterns capturing word characteristics. * Resonance Frequencies: Derived from the Schumann resonance (7.83 Hz) and UBP Pi (~2.427). * Spatial Vectors: 6D coordinates positioning words in a conceptual “Bitfield.” * Cultural and Harmonic Data: Contextual weights, waveforms, and harmonic properties.

    These .hexubp files are generated, managed, and visualized through an interactive Tkinter-based interface, making the system a powerful tool for exploring language through a mathematical lens.

    Unique Mathematical Foundation The UBP Dictionary System is distinguished by its deep reliance on mathematics to model language: * UBP Pi (~2.427): A custom constant derived from hexagonal geometry and resonance alignment (calculated as 6/2 * cos(2π * 7.83 * 0.318309886)), serving as the system’s foundational reference. * Resonance Frequencies: Frequencies are computed using word-specific hashes modulated by UBP Pi, with validation against the Schumann resonance (7.83 Hz ± 0.078 Hz), grounding the system in physical phenomena. * 6D Spatial Vectors: Words are positioned in a 6D Bitfield (x, y, z, time, phase, quantum state) based on toggle sums and frequency offsets, enabling spatial analysis of linguistic relationships. * GLR Validation: A non-corrective validation mechanism flags outliers in binary, frequency, and spatial data, ensuring mathematical integrity without compromising creativity.

    This mathematical rigor sets the system apart from traditional dictionaries, offering a framework where words are not just strings but dynamic entities with quantifiable properties. It’s a fusion of linguistics, physics, and computational theory, inviting users to rethink language as a multidimensional phenomenon.

    Comparison with Other Data Storage Mechanisms The .hexubp format is uniquely tailored for UBP’s multidimensional model. Here’s how it compares to other storage mechanisms, with metrics to highlight its strengths: CSV/JSON (Traditional Dictionaries): * Structure: Flat key-value pairs (e.g., word:definition). * Storage: ~100 bytes per word for simple text (e.g., “and”:“conjunction”). * Query Speed: O(1) for lookups, but no support for vector operations. * Limitations: Lacks multidimensional data (e.g., spatial vectors, frequencies). * .hexubp Advantage: Stores 12 vertices with vectors (~1-2 KB per word), enabling complex analyses like spatial clustering or frequency drift detection.

    Relational Databases (SQL): * Structure: Tabular, with columns for word, definition, etc. * Storage: ~200-500 bytes per word, plus index overhead. * Query Speed: O(log n) for indexed queries, slower for vector computations. * Limitations: Rigid schema, inefficient for 6D vectors or dynamic vertices. * .hexubp Advantage: Lightweight, file-based (~1-2 KB per word), with JSON flexibility for UBP’s hexagonal model, no database server required.

    Vector Databases (e.g., Word2Vec): * Structure: Fixed-dimension vectors (e.g., 300D for semantic embeddings). * Storage: ~2.4 KB per word (300 floats at 8 bytes each). * Query Speed: O(n) for similarity searches, optimized with indexing. * Limitations: Generic embeddings lack UBP-specific dimensions (e.g., resonance, toggles). * .hexubp Advantage: Smaller footprint (~1-2 KB), with domain-specific dimensions tailored to UBP’s theoretical framework.

    Graph Databases: * Structure: Nodes and edges for word relationships. * Storage: ~500 bytes per word, plus edge overhead. * Query Speed: O(k) for traversals, where k is edge count. * Limitations: Overkill for dictionary tasks, complex setup. * .hexubp Advantage: Self-contained hexagonal structure per word, simpler for UBP’s needs, with comparable storage (~1-2 KB).

    The .hexubp format balances storage efficiency, flexibility, and UBP-s...

  13. t

    BIOGRID CURATED DATA FOR MATH-42 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-42 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38934/table/caenorhabditis-elegans/math-42.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-42 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-42

  14. t

    BIOGRID CURATED DATA FOR MATH-14 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Nov 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2016). BIOGRID CURATED DATA FOR MATH-14 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/47506/table/caenorhabditis-elegans/math-14.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 5, 2016
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-14 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-14

  15. t

    BIOGRID CURATED DATA FOR MATH-33 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-33 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/44565/table/caenorhabditis-elegans/math-33.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-33 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-33

  16. HWRT database of handwritten symbols

    • zenodo.org
    • data.niaid.nih.gov
    tar
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Thoma; Martin Thoma (2020). HWRT database of handwritten symbols [Dataset]. http://doi.org/10.5281/zenodo.50022
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Thoma; Martin Thoma
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

    The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

    • symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.
    • train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.
    • test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

    All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

    About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!

  17. AI4Math: Mathematical QA Dataset

    • kaggle.com
    zip
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). AI4Math: Mathematical QA Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/ai4math-mathematical-qa-dataset
    Explore at:
    zip(1206195420 bytes)Available download formats
    Dataset updated
    Nov 26, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    AI4Math: Mathematical QA Dataset

    9,800 Questions from IQ Tests, FunctionQA & PaperQA

    By Huggingface Hub [source]

    About this dataset

    AI4Math is an invaluable resource for those seeking to advance their research in developing tools for mathematical question-answering. With a total of 9,800 questions from IQ tests, FunctionQA tasks, and PaperQA presentations, the dataset provides a comprehensive collection of questions with valuable annotations. This includes information on the text of the question, related images as well as a decoded version of the image, choisable answers whenever relevant to aid answering accuracy measurement (precision), and predetermined answer types along with metadata which can provide additional insight into certain cases. By making use of this dataset researchers are able to target different areas within mathematical question-answering with precision relative to their respective goals -- be it IQ tests or natural language processing based function computation -- while assessing progress through recorded accurate answers (precision). AI4Math is truly transforming how mathematics can be applied for machine learning applications one step at a time

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Before you get started with this dataset, it is important to familiarize yourself with the columns: question, image, decoded_image, choices, unit, precision​ , question_type​ , answer_type​ , metadata ​and query.
    • It is advisable that you read through the data dictionary provided in order to understand which columns are given in the dataset and what type of data each column contains (for example 'question' for text questions). ​
    • Once you understand what information is contained within each column – it’s time to start exploring! Use a visual exploration tool such as Tableau or Dataiku DSS to explore your data before doing any in-depth analysis or machine learning processing on it. Visual explorations can provide insights on trends across different fields including demographics and purchase history etcetera which can be interesting even if they don’t result in any direct output from machine learning or statistical models used later in analysis/prediction tasks..4 You may also want to consider using a text analyzer such as Google NL API or Word2Vec API to look for relationships between words used in certain questions and answers across all datasets – this could help you get more insight into your current datasets and plan ideas for future research . 5 Lastly make sure you always keep track of versioning when performing tasks on any large dataset – having multiple versions makes it easier for everyone involved since mistakes can always be reverted before reverting by accident everything related with completed analyses/models..6 After exploring your data its time for actual machine learning processing - depending on what type of activity need they may use supervised/unsupervised algorithm approaches , neural networks etcetera trying out multiple solutions looks like a good idea since some techniques might work better than others depending specific problem at hand 7 After running several experiments track down results keeping notes nearby metrics obtained along process not only during predictions but also training 8 Finally its very important evaluate models after every cycle making sure their performance stable ; many times accuracy improvement more reliable indicator valid model rather than metrics like accuracy itself 9 If satisfied results set watch performance continuously over time checking ongoing basis if everything still works correctly 10 To keep up date new developments regarding technologies being used its highly recommended subscribing mailing lists leading software products companies whose solutions using regularly

    Research Ideas

    • Using the metadata and question columns to develop algorithms that automatically generate questions for certain topics as defined by the user.
    • Utilizing the image column to create a computer vision model for predicting and classifying similar images.
    • Analyzing the content in both the choices and answer_type columns for extracting underlying patterns in IQ Tests, FunctionQA tasks, and PaperQA presentations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy,...

  18. Data from: Impacts of lessons management based on Mathematics words problems...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Alice Veiga Ferreira de Souza (2023). Impacts of lessons management based on Mathematics words problems on learning [Dataset]. http://doi.org/10.6084/m9.figshare.5720452.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Maria Alice Veiga Ferreira de Souza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT This article presents potential successes and constraints presented in lessons based on written words problems of mathematics impacting on the learning process of students in the eighth year of Portuguese classes in an elementary school. Those problems have been proposed by future teachers during a supervised internship at the University of Lisbon. The data emerged from strata of interaction/intervention of a teacher-coach with three interns regarding the actions of their lessons based on written words problems of Mathematics. Successes have been identified such as the association of geometric figures to their algebraic expressions and the conduction of explanations by direct questions on the subject, as well as constraints as confusing mathematical concepts, written commands with no meaning for students, terms without proper contextualization to the mathematical context. The research has been supported by authors and researchers in the field of problem solving, the understanding of statements of math problems and the training in/of teaching practice.

  19. Data of "A micromechanical Mean-Field Homogenization surrogate for the...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Manuel Calleja Vázquez; Ling Wu; Ling Wu; Van-Dung Nguyen; Van-Dung Nguyen; Ludovic Noels; Ludovic Noels; Juan Manuel Calleja Vázquez (2024). Data of "A micromechanical Mean-Field Homogenization surrogate for the stochastic multiscale analysis of composite materials failure" [Dataset]. http://doi.org/10.5281/zenodo.7998798
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan Manuel Calleja Vázquez; Ling Wu; Ling Wu; Van-Dung Nguyen; Van-Dung Nguyen; Ludovic Noels; Ludovic Noels; Juan Manuel Calleja Vázquez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Id
    title = "A micromechanical Mean-Field Homogenization surrogate for the stochastic multiscale analysis of composite materials failure"
    journal = International Journal for Numerical Methods in Engineering
    year = 2023
    volume = 124
    pages = 5200-5262
    doi = 10.1002/nme.7344
    authors = "Calleja, Juan Manuel and Wu Ling, and Nguyen, Van-Dung and Noels, Ludovic"

    If you use these data or model, we would be grateful if you could cite this above paper

    Software
    Requires GMSH and Python 3 with packages numpy, matplotlib, sklearn (scikit-learn), os, pickle, scipy, pandas, cvs, math, seaborn.
    Each folder contains readme that will help the user to navigate through the data.

    To run the model you need the open source code https://gitlab.onelab.info/cm3/cm3Libraries but you need to request access to cm3MFH as well

    Directories

    1. Main: Contains fast and easy access to the plots presented in the paper. The readme contained in this plot specifies the plots that are run with each code.
    2. 1_SVE_Generator:Contains the files needed for the generation of the SVE, the statistical properties of the microstructure, and PLY samples for the full-field simulations, as well as the used samples
    3. 2_Full_Field: contains the extracted data from the FF composite realizations, as well as the used random SVE geometries.
    4. 3_Identification: Contains the identification code to find the effective parameters for each SVE realization as well as the obtained identification results.
    5. 4_Generator: Contains the generated set of parameters for the 25 and 45 micrometer squared SVEs as well as the codes for the new data generation, the file with the generated data and the plots related with the MF-ROM random parameters and their cross-relations shown in Sections 2.5.2, 3.2.3 and 4.
    6. 5_Tests: Contains all the information concerning the tests used for the verification of the MF-ROM and the ply and experimental compression results.
    7. MFH_vs_FF: Allows to easily test the inverse identification process through the use of random SVEs and verify the performance of the identified MFH parameters against its full-field counterpart.

    Plot of figures

    Figure 9 : Run "python plot_Gc.py" which can be found in folder Main/Full_Field_Energy
    Figure 10: Run "python3 PDF_HIST_Gc.py", which can be found in folder Main/Histograms
    Figure 23: Run "python3 plot.py" which can be found in folder Main/MFH_FF_Comparison
    Figure 24: Run "python3 plot.py" which can be found in folder Main/MFH_FF_Comparison
    Figure 27: Run "python3 Correlation_Graphs_25.py contained in folder Main/Distributions_25_Micrometer_SVE
    Figure 29: Run "python3 PDF_HIST.py" which can be found in folder Main/Histograms
    Figure 30: To obtain the data used in this figure, run "python3 DistanceCorrelation_25.py" which can be found in folder /4_Generator
    Figure 31: To obtain the data used in this figure, run "python3 DistanceCorrelation_45.py" which can be found in folder /4_Generator
    Figure 32: Run "python3 Correlation_Graphs_25.py" which can be found in folder Main/Distributions_25_Micrometer_SVE
    Figure 33: Run "python3 Correlation_Graphs_25.py" which can be found in folder Main/Distributions_25_Micrometer_SVE
    Figure 34: Run "python3 Correlation_Graphs_25.py" which can be found in folder Main/Distributions_25_Micrometer_SVE
    Figure 36: Run "python3 plot_New.py" which can be found in folder Main/PlyTests
    Figure 46: Run "python3 plot_Test.py" which can be found in folder Main/CompressionExperiment
    Figure B3: Run "python3 MicroStrAna.py" which can be found in folder Main/MicroStructStatistics
    Figure B4: Run "python3 MicroStrAna.py" which can be found in folder Main/MicroStructStatistics
    Figure D5: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D6: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D7: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D8: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D9: Run "python3 PDF_HIST_B.py" which can be found n folder Main/Histograms
    Figure D10: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D11: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D12: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D13: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure D14: Run "python3 PDF_HIST_B.py" which can be found in folder Main/Histograms
    Figure E15: Run "python3 Correlation_Graphs_45.py" which can be found in folder Main/Distributions_45_Micrometer_SVE
    Figure E16: Run "python3 Correlation_Graphs_45.py" which can be found in folder Main/Distributions_45_Micrometer_SVE
    Figure E17: Run "python3 Correlation_Graphs_45.py" which can be found in folder Main/Distributions_45_Micrometer_SVE
    Figure F18: Run "python3 plot_Convergence_25.py" which can be found in folder Main/Convergence
    Figure F19: Run "python3 plot_Convergence_45.py" which can be found in folder Main/Convergence

  20. u

    Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading,...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading, science and mathematics assessment - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-051c952c-0f58-4da6-b597-d8b0821d7be7
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Reading, science and math mean scores from the Pan-Canadian Assessment Program (PCAP), by province.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
Organization logo

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

Explore at:
zip(3418660 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

By Huggingface Hub [source]

About this dataset

This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

Research Ideas

  • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
  • Generating new grade school math questions and answers using g...
Search
Clear search
Close search
Google apps
Main menu