43 datasets found
  1. Data from: Meaning of derivative in the book tasks of 1st of “Bachillerato”

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo (2023). Meaning of derivative in the book tasks of 1st of “Bachillerato” [Dataset]. http://doi.org/10.6084/m9.figshare.14304760.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Due to the importance of textbooks within the processes of teaching and learning in Mathematics, this article focuses on the tasks proposed in five textbooks of 1st of Bachillerato for this topic. The goal is to identify meanings of derivative in the textbooks through the proposed tasks. It is a quantitative research in which, by means of a cluster analysis, the tasks were grouped according to similarity. The results show that the books emphasize three meanings of the derivative: one procedural-algebraic, one algorithmic, and finally another conceptual-geometric meaning, all of them dominated by the symbolic representation system and that exclusively show a mathematical context.

  2. D

    Comparative Judgement of Statements About Mathematical Definitions

    • dataverse.azure.uit.no
    • dataverse.no
    csv, txt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
    Explore at:
    txt(3623), csv(37503), csv(43566), csv(2523)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.

  3. Data from: Linguistic Appropriation and Meaning in Mathematical Modeling...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bárbara Nivalda Palharini Alvim Sousa; Lourdes Maria Werle de Almeida (2023). Linguistic Appropriation and Meaning in Mathematical Modeling Activities [Dataset]. http://doi.org/10.6084/m9.figshare.11314559.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Bárbara Nivalda Palharini Alvim Sousa; Lourdes Maria Werle de Almeida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract In this paper we turn our attention to the different language games associated to the development of Mathematical Modelling activities and to the meanings constituted by students within these language games in relation to the first order ordinary differential equations. The research is based on Mathematical Modelling in Mathematics Education and has as its philosophical basis the studies of Ludwig Wittgenstein and some of his interpreters. Considering these theoretical-philosophical elements, mathematical modelling activities were developed in a Mathematics Degree in a course of Ordinary Differential Equations. Data were collected through written records, audio and video recordings, questionnaires, and interviews. The data analysis methodology considers the students' discursive practices and allowed us to construct trees of idea association. The results indicate that the constitution of meaning within modelling activities is associated to the students' linguistic appropriation of the rules and techniques that are configured in specific language games identified in the Mathematical Modelling activities.

  4. MetaMath QA

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
    Explore at:
    zip(78629842 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  5. t

    BIOGRID CURATED DATA FOR MATH-4 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-4 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38920/table/caenorhabditis-elegans/math-4.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-4 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-4

  6. Data from: Mathematical approach to the validation of surface texture form...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Todhunter, Luke; Leach, Richard; Blateyron, Francois (2020). Mathematical approach to the validation of surface texture form removal software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4063715
    Explore at:
    Dataset updated
    Oct 3, 2020
    Dataset provided by
    Digital Surfhttp://www.digitalsurf.com/
    University of Nottingham
    Authors
    Todhunter, Luke; Leach, Richard; Blateyron, Francois
    Description

    A new approach to the validation of surface texture form removal methods is introduced. A linear algebra technique is presented that obtains total least squares (TLS) model fits for a continuous mathematical surface definition. This model is applicable to both profile and areal form removal, and can be used for a range of form removal models including polynomial and spherical fits. The continuous TLS method enables the creation of mathematically traceable reference pairs suitable for the assessment of form removal algorithms in surface texture analysis software. Multiple example reference pairs are presented and used to assess the performance of four tested surface texture analysis software packages. The results of each software are compared against the mathematical reference, highlighting their strengths and weaknesses.

  7. t

    BIOGRID CURATED DATA FOR MATH-48 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-48 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/55802/table/caenorhabditis-elegans/math-48.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 4, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-48 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: MATH (meprin-associated Traf homology) domain containing

  8. GSM8K - Grade School Math 8K Q&A

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
    Explore at:
    zip(3418660 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GSM8K - Grade School Math 8K Q&A

    A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

    By Huggingface Hub [source]

    About this dataset

    This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

    The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

    To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

    Research Ideas

    • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
    • Generating new grade school math questions and answers using g...
  9. Data from: Slope Conceptualizations in Mathematics Textbooks

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crisólogo Dolores Flores; Gerardo Ibáñez Dolores (2023). Slope Conceptualizations in Mathematics Textbooks [Dataset]. http://doi.org/10.6084/m9.figshare.14304761.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Crisólogo Dolores Flores; Gerardo Ibáñez Dolores
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This paper reports the results of an investigation whose objective is to find which slope conceptualizations have a presence in high school mathematics textbooks and which are the predominant ones. For this, we used the Content Analysis method, where the objects of analysis are found in content exposition, worked examples, and in the exercises or problems proposed in the textbooks. As a reference framework we used the eleven slope conceptualizations identified by Stump (1999) and Moore-Russo, Conner and Rugg (2011). Our findings indicate the presence of most of the conceptualizations identified in the previous research, however, there is a notable predominance of those that emerge from the analytical definition of slope, such as the parametric coefficient, algebraic ratio, and trigonometric conception and its internal application in determination of parallelism or perpendicularity between lines as is the determining property. These conceptualizations, on the one hand, induce to formation of the idea that slope makes sense only in intra-mathematical context, and on the other hand, they favor the development of procedural knowledge on detriment of conceptual knowledge. Understanding the slope requires the creation of internal networks as a product of connections between conceptualizations intra and extra mathematical plane, in addition to the harmonious development of conceptual and procedural knowledge. Achieving the understanding of the concepts is essential for Mathematics Education, however, our results indicate that the texts used by teachers can hardly contribute to this achievement.

  10. Hex Dictionary V2

    • kaggle.com
    zip
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DigitalEuan (2025). Hex Dictionary V2 [Dataset]. https://www.kaggle.com/datasets/digitaleuan/hex-dictionary-v2
    Explore at:
    zip(203686 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    DigitalEuan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    READ ME

    Welcome to the Universal Binary Principle (UBP) Dictionary System - Version 2

    Author: Euan Craig, New Zealand 2025

    Embark on a revolutionary journey with Version 2 of the UBP Dictionary System, a cutting-edge Python notebook that redefines how words are stored, analyzed, and visualized! Built for Kaggle, this system encodes words as multidimensional hexagonal structures in custom .hexubp files, leveraging sophisticated mathematics to integrate binary toggles, resonance frequencies, spatial coordinates, and more, all rooted in the Universal Binary Principle (UBP). This is not just a dictionary—it’s a paradigm shift in linguistic representation.

    What is the UBP Dictionary System? The UBP Dictionary System transforms words into rich, vectorized representations stored in custom .hexubp files—a JSON-based format designed to encapsulate a word’s multidimensional UBP properties. Each .hexubp file represents a word as a hexagonal structure with 12 vertices, encoding: * Binary Toggles: 6-bit patterns capturing word characteristics. * Resonance Frequencies: Derived from the Schumann resonance (7.83 Hz) and UBP Pi (~2.427). * Spatial Vectors: 6D coordinates positioning words in a conceptual “Bitfield.” * Cultural and Harmonic Data: Contextual weights, waveforms, and harmonic properties.

    These .hexubp files are generated, managed, and visualized through an interactive Tkinter-based interface, making the system a powerful tool for exploring language through a mathematical lens.

    Unique Mathematical Foundation The UBP Dictionary System is distinguished by its deep reliance on mathematics to model language: * UBP Pi (~2.427): A custom constant derived from hexagonal geometry and resonance alignment (calculated as 6/2 * cos(2π * 7.83 * 0.318309886)), serving as the system’s foundational reference. * Resonance Frequencies: Frequencies are computed using word-specific hashes modulated by UBP Pi, with validation against the Schumann resonance (7.83 Hz ± 0.078 Hz), grounding the system in physical phenomena. * 6D Spatial Vectors: Words are positioned in a 6D Bitfield (x, y, z, time, phase, quantum state) based on toggle sums and frequency offsets, enabling spatial analysis of linguistic relationships. * GLR Validation: A non-corrective validation mechanism flags outliers in binary, frequency, and spatial data, ensuring mathematical integrity without compromising creativity.

    This mathematical rigor sets the system apart from traditional dictionaries, offering a framework where words are not just strings but dynamic entities with quantifiable properties. It’s a fusion of linguistics, physics, and computational theory, inviting users to rethink language as a multidimensional phenomenon.

    Comparison with Other Data Storage Mechanisms The .hexubp format is uniquely tailored for UBP’s multidimensional model. Here’s how it compares to other storage mechanisms, with metrics to highlight its strengths: CSV/JSON (Traditional Dictionaries): * Structure: Flat key-value pairs (e.g., word:definition). * Storage: ~100 bytes per word for simple text (e.g., “and”:“conjunction”). * Query Speed: O(1) for lookups, but no support for vector operations. * Limitations: Lacks multidimensional data (e.g., spatial vectors, frequencies). * .hexubp Advantage: Stores 12 vertices with vectors (~1-2 KB per word), enabling complex analyses like spatial clustering or frequency drift detection.

    Relational Databases (SQL): * Structure: Tabular, with columns for word, definition, etc. * Storage: ~200-500 bytes per word, plus index overhead. * Query Speed: O(log n) for indexed queries, slower for vector computations. * Limitations: Rigid schema, inefficient for 6D vectors or dynamic vertices. * .hexubp Advantage: Lightweight, file-based (~1-2 KB per word), with JSON flexibility for UBP’s hexagonal model, no database server required.

    Vector Databases (e.g., Word2Vec): * Structure: Fixed-dimension vectors (e.g., 300D for semantic embeddings). * Storage: ~2.4 KB per word (300 floats at 8 bytes each). * Query Speed: O(n) for similarity searches, optimized with indexing. * Limitations: Generic embeddings lack UBP-specific dimensions (e.g., resonance, toggles). * .hexubp Advantage: Smaller footprint (~1-2 KB), with domain-specific dimensions tailored to UBP’s theoretical framework.

    Graph Databases: * Structure: Nodes and edges for word relationships. * Storage: ~500 bytes per word, plus edge overhead. * Query Speed: O(k) for traversals, where k is edge count. * Limitations: Overkill for dictionary tasks, complex setup. * .hexubp Advantage: Self-contained hexagonal structure per word, simpler for UBP’s needs, with comparable storage (~1-2 KB).

    The .hexubp format balances storage efficiency, flexibility, and UBP-s...

  11. t

    BIOGRID CURATED DATA FOR MATH-50 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated May 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-50 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38929/table/caenorhabditis-elegans/math-50.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 5, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-50 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: math-50 encodes a protein which has a meprin-associated Traf homology (MATH) domain and may be involved in apoptosis.

  12. t

    BIOGRID CURATED DATA FOR MATH-34 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-34 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38858/table/caenorhabditis-elegans/math-34.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-34 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-34

  13. StudentMathScores

    • kaggle.com
    zip
    Updated Jun 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Logan Henslee (2019). StudentMathScores [Dataset]. https://www.kaggle.com/loganhenslee/studentmathscores
    Explore at:
    zip(333321 bytes)Available download formats
    Dataset updated
    Jun 10, 2019
    Authors
    Logan Henslee
    Description

    CONTEXT

    Practice Scenario: The UIW School of Engineering wants to recruit more students into their program. They will recruit students with great math scores. Also, to increase the chances of recruitment,​ the department will look for students who qualify for financial aid. Students who qualify for financial aid more than likely come from low socio-economic backgrounds. One way to indicate this is to view how much federal revenue a school district receives through its state. High federal revenue for a school indicates that a large portion of the student base comes from low incomes families.

    The question we wish to ask is as follows: Name the school districts across the nation where their Child Nutrition Programs(c25) are federally funded between the amounts $30,000 and $50,000. And where the average math score for the school districts corresponding state is greater than or equal to the nations average score of 282.

    The SQL query below in 'Top5MathTarget.sql' can be used to answer this question in MySQL. To execute this process, one would need to install MySQL to their local system and load the attached datasets below from Kaggle into their MySQL schema. The SQL query below will then join the separate tables on various key identifiers.

    DATA SOURCE Data is sourced from The U.S Census Bureau and The Nations Report Card (using the NAEP Data Explorer).

    Finance: https://www.census.gov/programs-surveys/school-finances/data/tables.html

    Math Scores: https://www.nationsreportcard.gov/ndecore/xplore/NDE

    COLUMN NOTES

    All data comes from the school year 2017. Individual schools are not represented, only school districts within each state.

    FEDERAL FINANCE DATA DEFINITIONS

    t_fed_rev: Total federal revenue through the state to each school district.

    C14- Federal revenue through the state- Title 1 (no child left behind act).

    C25- Federal revenue through the state- Child Nutrition Act.

    Title 1 is a program implemented in schools to help raise academic achievement ​for all students. The program is available to schools where at least 40% of the students come from low inccom​​e families.

    Child Nutrition Programs ensure the children are getting the food they need to grow and learn. Schools with high federal revenue to these programs indicate students that also come from low income​ families.

    MATH SCORES DATA DEFINITIONS

    Note: Mathematics, Grade 8, 2017, All Students (Total)

    average_scale_score - The state's average score for eighth graders taking the NAEP math exam.

  14. HWRT database of handwritten symbols

    • zenodo.org
    • data.niaid.nih.gov
    tar
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Thoma; Martin Thoma (2020). HWRT database of handwritten symbols [Dataset]. http://doi.org/10.5281/zenodo.50022
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Thoma; Martin Thoma
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

    The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

    • symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.
    • train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.
    • test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

    All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

    About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!

  15. f

    Data from: Balancing Multi-Manned Assembly Lines With Walking Workers:...

    • tandf.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Şahin; Talip Kellegöz (2023). Balancing Multi-Manned Assembly Lines With Walking Workers: Problem Definition, Mathematical Formulation, and an Electromagnetic Field Optimisation Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.7624499.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Murat Şahin; Talip Kellegöz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Assembly lines are widely used in industrial environments that produce standardised products in high volumes. Multi-manned assembly line is a special version of them that allows simultaneous operation of more than one worker at the same workstation. These lines are widely used in large-sized product manufacturing since they have many advantages over the simple one. This article has dealt with multi-manned assembly line balancing problem with walking workers for minimising the number of workers and workstations as the first and second objectives, respectively. A linear mixed-integer programming formulation of the problem has been firstly addressed after the problem definition is given. Besides that, a metaheuristic based on electromagnetic field optimisation algorithm has been improved. In addition to the classical electromagnetic field optimisation algorithm, a regeneration strategy has been applied to enhance diversification. A particle swarm optimisation algorithm from assembly line balancing literature has been modified to compare with the proposed algorithm. A group of test instances from many precedence diagrams were generated for evaluating the performances of all solution methods. Deviations from lower bound values of the number of workers/workstations and the number of optimal solutions obtained by these methods are concerned as performance criteria. The results obtained by the proposed programming formulations have been also compared with the solutions obtained by the traditional mathematical model of the multi-manned assembly line. Through the experimental results, the performance of the metaheuristic has been found very satisfactory according to the number of obtained optimal solutions and deviations from lower bound values.

  16. t

    BIOGRID CURATED DATA FOR MATH-39 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-39 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38924/table/caenorhabditis-elegans/math-39.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-39 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: MATH (meprin-associated Traf homology) domain containing

  17. w

    Data Use in Academia Dataset

    • datacatalog.worldbank.org
    csv, utf-8
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
    Explore at:
    utf-8, csvAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Semantic Scholar Open Research Corpus (S2ORC)
    Brian William Stacy
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Description

    This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.


    Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.


    We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.


    Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.


    The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.


    To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.


    The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.


    The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:


    Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.

    There are two classification tasks in this exercise:

    1. identifying whether an academic article is using data from any country

    2. Identifying from which country that data came.

    For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.

    After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]

    For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.

    We expect between 10 and 35 percent of all articles to use data.


    The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.


    A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.


    The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.


    The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of

  18. Development of Mathematical Programming Model for Cable Logging System...

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alynne Rudek; Eduardo da Silva Lopes; Julio Eduardo Arce; Paulo Costa de Oliveira Filho (2023). Development of Mathematical Programming Model for Cable Logging System Location [Dataset]. http://doi.org/10.6084/m9.figshare.7451918.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Alynne Rudek; Eduardo da Silva Lopes; Julio Eduardo Arce; Paulo Costa de Oliveira Filho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Defining the optimum points for installing of a cable logging system is a problem faced by forestry planners. This study evaluated the application of a mathematical programming model for optimal location of cable logging in wood extraction. The study was conducted in a forestry company located in Parana State, Brazil. We collected data during timber harvesting and developed mathematical models to define the optimal location of the cable logging considering the variables “cycle time” and “extraction distance”. The variable “cycle time” affected the definition of the optimal location of equipment resulted in a reduced number of installation points with the largest coverage area. The variable “distance extraction” negatively influenced the location, with an increased number of installation points with smaller coverage. The developed model was efficient, but needs to be improved in order to ensure greater accuracy in wood extraction over long distances.

  19. u

    Unit process data for field crop production version 1.1

    • agdatacommons.nal.usda.gov
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joyce Cooper (2025). Unit process data for field crop production version 1.1 [Dataset]. http://doi.org/10.15482/USDA.ADC/1226081
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Joyce Cooper
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The release of the LCA Commons Unit Process Data: field crop production Version 1.1 includes the following updates:Added meta data to reflect USDA LCA Digital Commons data submission guidance including descriptions of the process (reference to which the size of the inputs and outputs in the process relate, description of the process and technical scope and any aggregation; definition of the technology being used, its operating conditions); temporal representatives; geographic representativeness; allocation methods; process type (U: unit process, S: system process); treatment of missing intermediate flow data; treatment of missing flow data to or from the environment; intermediate flow data sources; mass balance; data treatment (description of the methods and assumptions used to transform primary and secondary data into flow quantities through recalculating, reformatting, aggregation, or proxy data and a description of data quality according to LCADC convention); sampling procedures; and review details. Also, dataset documentation and related archival publications are cited in the APA format.Changed intermediate flow categories and subcategories to reflect the ISIC International Standard Industrial Classification (ISIC).Added “US-” to the US state abbreviations for intermediate flow locations.Corrected the ISIC code for “CUTOFF domestic barge transport; average fuel” (changed to ISIC 5022: Inland freight water transport).Corrected flow names as follows: "Propachlor" renamed "Atrazine". “Bromoxynil octanoate” renamed “Bromoxynil heptanoate”. “water; plant uptake; biogenic” renamed “water; from plant uptake; biogenic” half the instances of “Benzene, pentachloronitro-“ replaced with Etridiazole and half with “Quintozene”. “CUTOFF phosphatic fertilizer, superphos. grades 22% & under; at point-of-sale” replaced with “CUTOFF phosphatic fertilizer, superphos. grades 22% and under; at point-of-sale”.Corrected flow values for “water; from plant uptake; biogenic” and “dry matter except CNPK; from plant uptake; biogenic” in some datasets.Presented data in the International Reference Life Cycle Data System (ILCD)1 format, allowing the parameterization of raw data and mathematical relations to be presented within the datasets and the inclusion of parameter uncertainty data. Note that ILCD formatted data can be converted to the ecospold v1 format using the OpenLCA software.Data quality rankings have been updated to reflect the inclusion of uncertainty data in the ILCD formatted data.Changed all parameter names to “pxxxx” to accommodate mathematical relation character limitations in OpenLCA. Also adjusted select mathematical relations to recognize zero entries. The revised list of parameter names is provided in the documentation attached.Resources in this dataset:Resource Title: Cooper-crop-production-data-parameterization-version-1.1 .File Name: Cooper-crop-production-data-parameterization-version-1.1.xlsxResource Description: Description of parameters that define the Cooper Unit process data for field crop production version 1.1Resource Title: Cooper_Crop_Data_v1.1_ILCD.File Name: Cooper_Crop_Data_v1.1_ILCD.zipResource Description: .zip archive of ILCD xml files that comprise crop production unit process modelsResource Software Recommended: openLCA,url: http://www.openlca.org/Resource Title: Summary of Revisions of the LCA Digital Commons Unit Process Data: field crop production for version 1.1 (August 2013).File Name: Summary of Revisions of the LCA Digital Commons Unit Process Data- field crop production, Version 1.1 (August 2013).pdfResource Description: Documentation of revisions to version 1 data that constitute version 1.1

  20. t

    BIOGRID CURATED DATA FOR MATH-42 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR MATH-42 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/38934/table/caenorhabditis-elegans/math-42.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for MATH-42 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein MATH-42

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo (2023). Meaning of derivative in the book tasks of 1st of “Bachillerato” [Dataset]. http://doi.org/10.6084/m9.figshare.14304760.v1
Organization logo

Data from: Meaning of derivative in the book tasks of 1st of “Bachillerato”

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
María Fernanda Vargas; José Antonio Fernández-Plaza; Juan Francisco Ruiz-Hidalgo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract Due to the importance of textbooks within the processes of teaching and learning in Mathematics, this article focuses on the tasks proposed in five textbooks of 1st of Bachillerato for this topic. The goal is to identify meanings of derivative in the textbooks through the proposed tasks. It is a quantitative research in which, by means of a cluster analysis, the tasks were grouped according to similarity. The results show that the books emphasize three meanings of the derivative: one procedural-algebraic, one algorithmic, and finally another conceptual-geometric meaning, all of them dominated by the symbolic representation system and that exclusively show a mathematical context.

Search
Clear search
Close search
Google apps
Main menu