25 datasets found
  1. MetaMath QA

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
    Explore at:
    zip(78629842 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  2. GSM8K - Grade School Math 8K Q&A

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
    Explore at:
    zip(3418660 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GSM8K - Grade School Math 8K Q&A

    A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

    By Huggingface Hub [source]

    About this dataset

    This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

    The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

    To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

    Research Ideas

    • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
    • Generating new grade school math questions and answers using g...
  3. h

    Data from: MathCheck

    • huggingface.co
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PremiLab-Math (2024). MathCheck [Dataset]. https://huggingface.co/datasets/PremiLab-Math/MathCheck
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    PremiLab-Math
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a substantial risk of model overfitting and fails to accurately represent genuine mathematical… See the full description on the dataset page: https://huggingface.co/datasets/PremiLab-Math/MathCheck.

  4. D

    Comparative Judgement of Statements About Mathematical Definitions

    • dataverse.no
    • dataverse.azure.uit.no
    csv, txt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
    Explore at:
    csv(43566), csv(2523), csv(37503), txt(3623)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.

  5. y

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old...

    • data.yorkopendata.org
    • ckan.publishing.service.gov.uk
    • +3more
    Updated Mar 18, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old Best Entry definition) - (Snapshot) [Dataset]. https://data.yorkopendata.org/dataset/kpi-75
    Explore at:
    Dataset updated
    Mar 18, 2015
    License

    Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
    License information was derived automatically

    Description

    % of pupils achieving 5+ A*-Cs GCSE inc. English & Maths at Key Stage 4 (old Best Entry definition) - (Snapshot) *This indicator was discontinued in 2014 due to the national changes in GCSEs.

  6. PISA Performance Scores by Country

    • kaggle.com
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). PISA Performance Scores by Country [Dataset]. https://www.kaggle.com/datasets/thedevastator/pisa-performance-scores-by-country/code
    Explore at:
    zip(14656 bytes)Available download formats
    Dataset updated
    Dec 6, 2023
    Authors
    The Devastator
    Description

    PISA Performance Scores by Country

    PISA Performance Scores by Country and Year

    By Dennis Kao [source]

    About this dataset

    The OECD PISA dataset provides performance scores for 15-year-old students in reading, mathematics, and science across OECD countries. The dataset covers the years 2000 to 2018.

    These performance scores are measured using the Programme for International Student Assessment (PISA), which evaluates students' abilities to apply their knowledge and skills in reading, mathematics, and science to real-life challenges.

    Reading performance is assessed based on the capacity to comprehend, use, and reflect on written texts for achieving goals, developing knowledge and potential, and participating in society.

    Mathematical performance measures a student's mathematical literacy by evaluating their ability to formulate, employ, and interpret mathematics in various contexts. This includes describing, predicting, and explaining phenomena while recognizing the role that mathematics plays in the world.

    Scientific performance examines a student's scientific literacy in terms of utilizing scientific knowledge to identify questions/problems/topics of interest relevant with respect to acquiring new findings/evidence/information/knowledge/content/formulation/input/output/extra-data/base/media/stats/questions/dimensions/distributions/effects/conclusions/issues/observations/trends/patterns/distribution/symptoms/hypotheses/preferences/facts/opinions/theories/beliefs/problems/causes/reasons/tests/methods/classifications/experiments/analysis/measurement/context/situations/experience/reactions/respondents/influences/emotions/perceptions/criteria/outcomes/effects/effects/significance/importance/applications/variables/models/procedures/mechanisms/concepts/spaces/types/designs/goals/models/schematics/specifications/tools/interventions/initiatives/factors/metrics/advice/sources/research/reference/background/theoretical/historical/cultural/scientific/ethical/methodological limits/rules/norms/steps/examples/practices/workflows/judgments/inferences/discoveries/disputed-effects/negative-effects/right/strength Theses skills enable them i.e., recognize claims or manipulate materials as evidence-based conclusions to address scientific phenomena and draw evidence-based conclusions about science-related issues.

    The dataset includes information on the performance scores categorized by location (country alpha‑3 codes), indicator (reading, mathematical, or scientific performance), subject (boys/girls/total), and time of measurement (year). The mean score for each combination of these variables is provided in the Value column.

    For more detailed information on how the dataset was collected and analyzed, please refer to the original source

    How to use the dataset

    Understanding the Columns

    Before diving into the analysis, it is important to understand the meaning of each column in the dataset:

    • LOCATION: This column represents country alpha-3 codes. OAVG indicates an average across all OECD countries.

    • INDICATOR: The performance indicator being measured can be one of three options: Reading performance (PISAREAD), Mathematical performance (PISAMATH), or Scientific performance (PISASCIENCE).

    • SUBJECT: This column categorizes subjects as BOY (boys), GIRL (girls), or TOT (total). It indicates which group's scores are being considered.

    • TIME: The year in which the performance scores were measured can range from 2000 to 2018.

    • Value: The mean score of the performance indicator for a specific subject and year is provided in this column as a floating-point number.

    Getting Started with Analysis

    Here are some ideas on how you can start exploring and analyzing this dataset:

    • Comparing countries: You can use this dataset to compare educational performances between different countries over time for various subjects like reading, mathematics, and science.

    • Subject-based analysis: You can focus on studying how gender affects students' performances by filtering data based on subject ('BOY', 'GIRL') along with years or individual countries.

    • Time-based trends: Analyze trends over time by examining changes in mean scores for various indicators across years.

    • ** OECD vs Non-OECD Countries**: Determine if there are significant differences in performance scores between OECD countries and non-OECD countries. You can filter the data by the LOCATION column to obtain separate datasets for each group and compare their mean scores.

    Data Visualization

    To enhance your understanding of the dataset, visuali...

  7. HWRT database of handwritten symbols

    • zenodo.org
    • data.niaid.nih.gov
    tar
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Thoma; Martin Thoma (2020). HWRT database of handwritten symbols [Dataset]. http://doi.org/10.5281/zenodo.50022
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Thoma; Martin Thoma
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

    The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

    • symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.
    • train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.
    • test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

    All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

    About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!

  8. Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading,...

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +1more
    Updated Oct 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2022). Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading, science and mathematics assessment [Dataset]. http://doi.org/10.25318/3710022901-eng
    Explore at:
    Dataset updated
    Oct 18, 2022
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Government of Canadahttp://www.gg.ca/
    Area covered
    Canada
    Description

    Reading, science and math mean scores from the Pan-Canadian Assessment Program (PCAP), by province.

  9. Hex Dictionary V2

    • kaggle.com
    zip
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DigitalEuan (2025). Hex Dictionary V2 [Dataset]. https://www.kaggle.com/datasets/digitaleuan/hex-dictionary-v2
    Explore at:
    zip(203686 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    DigitalEuan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    READ ME

    Welcome to the Universal Binary Principle (UBP) Dictionary System - Version 2

    Author: Euan Craig, New Zealand 2025

    Embark on a revolutionary journey with Version 2 of the UBP Dictionary System, a cutting-edge Python notebook that redefines how words are stored, analyzed, and visualized! Built for Kaggle, this system encodes words as multidimensional hexagonal structures in custom .hexubp files, leveraging sophisticated mathematics to integrate binary toggles, resonance frequencies, spatial coordinates, and more, all rooted in the Universal Binary Principle (UBP). This is not just a dictionary—it’s a paradigm shift in linguistic representation.

    What is the UBP Dictionary System? The UBP Dictionary System transforms words into rich, vectorized representations stored in custom .hexubp files—a JSON-based format designed to encapsulate a word’s multidimensional UBP properties. Each .hexubp file represents a word as a hexagonal structure with 12 vertices, encoding: * Binary Toggles: 6-bit patterns capturing word characteristics. * Resonance Frequencies: Derived from the Schumann resonance (7.83 Hz) and UBP Pi (~2.427). * Spatial Vectors: 6D coordinates positioning words in a conceptual “Bitfield.” * Cultural and Harmonic Data: Contextual weights, waveforms, and harmonic properties.

    These .hexubp files are generated, managed, and visualized through an interactive Tkinter-based interface, making the system a powerful tool for exploring language through a mathematical lens.

    Unique Mathematical Foundation The UBP Dictionary System is distinguished by its deep reliance on mathematics to model language: * UBP Pi (~2.427): A custom constant derived from hexagonal geometry and resonance alignment (calculated as 6/2 * cos(2π * 7.83 * 0.318309886)), serving as the system’s foundational reference. * Resonance Frequencies: Frequencies are computed using word-specific hashes modulated by UBP Pi, with validation against the Schumann resonance (7.83 Hz ± 0.078 Hz), grounding the system in physical phenomena. * 6D Spatial Vectors: Words are positioned in a 6D Bitfield (x, y, z, time, phase, quantum state) based on toggle sums and frequency offsets, enabling spatial analysis of linguistic relationships. * GLR Validation: A non-corrective validation mechanism flags outliers in binary, frequency, and spatial data, ensuring mathematical integrity without compromising creativity.

    This mathematical rigor sets the system apart from traditional dictionaries, offering a framework where words are not just strings but dynamic entities with quantifiable properties. It’s a fusion of linguistics, physics, and computational theory, inviting users to rethink language as a multidimensional phenomenon.

    Comparison with Other Data Storage Mechanisms The .hexubp format is uniquely tailored for UBP’s multidimensional model. Here’s how it compares to other storage mechanisms, with metrics to highlight its strengths: CSV/JSON (Traditional Dictionaries): * Structure: Flat key-value pairs (e.g., word:definition). * Storage: ~100 bytes per word for simple text (e.g., “and”:“conjunction”). * Query Speed: O(1) for lookups, but no support for vector operations. * Limitations: Lacks multidimensional data (e.g., spatial vectors, frequencies). * .hexubp Advantage: Stores 12 vertices with vectors (~1-2 KB per word), enabling complex analyses like spatial clustering or frequency drift detection.

    Relational Databases (SQL): * Structure: Tabular, with columns for word, definition, etc. * Storage: ~200-500 bytes per word, plus index overhead. * Query Speed: O(log n) for indexed queries, slower for vector computations. * Limitations: Rigid schema, inefficient for 6D vectors or dynamic vertices. * .hexubp Advantage: Lightweight, file-based (~1-2 KB per word), with JSON flexibility for UBP’s hexagonal model, no database server required.

    Vector Databases (e.g., Word2Vec): * Structure: Fixed-dimension vectors (e.g., 300D for semantic embeddings). * Storage: ~2.4 KB per word (300 floats at 8 bytes each). * Query Speed: O(n) for similarity searches, optimized with indexing. * Limitations: Generic embeddings lack UBP-specific dimensions (e.g., resonance, toggles). * .hexubp Advantage: Smaller footprint (~1-2 KB), with domain-specific dimensions tailored to UBP’s theoretical framework.

    Graph Databases: * Structure: Nodes and edges for word relationships. * Storage: ~500 bytes per word, plus edge overhead. * Query Speed: O(k) for traversals, where k is edge count. * Limitations: Overkill for dictionary tasks, complex setup. * .hexubp Advantage: Self-contained hexagonal structure per word, simpler for UBP’s needs, with comparable storage (~1-2 KB).

    The .hexubp format balances storage efficiency, flexibility, and UBP-s...

  10. u

    Unit process data for field crop production version 1.1

    • agdatacommons.nal.usda.gov
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joyce Cooper (2025). Unit process data for field crop production version 1.1 [Dataset]. http://doi.org/10.15482/USDA.ADC/1226081
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Joyce Cooper
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The release of the LCA Commons Unit Process Data: field crop production Version 1.1 includes the following updates:Added meta data to reflect USDA LCA Digital Commons data submission guidance including descriptions of the process (reference to which the size of the inputs and outputs in the process relate, description of the process and technical scope and any aggregation; definition of the technology being used, its operating conditions); temporal representatives; geographic representativeness; allocation methods; process type (U: unit process, S: system process); treatment of missing intermediate flow data; treatment of missing flow data to or from the environment; intermediate flow data sources; mass balance; data treatment (description of the methods and assumptions used to transform primary and secondary data into flow quantities through recalculating, reformatting, aggregation, or proxy data and a description of data quality according to LCADC convention); sampling procedures; and review details. Also, dataset documentation and related archival publications are cited in the APA format.Changed intermediate flow categories and subcategories to reflect the ISIC International Standard Industrial Classification (ISIC).Added “US-” to the US state abbreviations for intermediate flow locations.Corrected the ISIC code for “CUTOFF domestic barge transport; average fuel” (changed to ISIC 5022: Inland freight water transport).Corrected flow names as follows: "Propachlor" renamed "Atrazine". “Bromoxynil octanoate” renamed “Bromoxynil heptanoate”. “water; plant uptake; biogenic” renamed “water; from plant uptake; biogenic” half the instances of “Benzene, pentachloronitro-“ replaced with Etridiazole and half with “Quintozene”. “CUTOFF phosphatic fertilizer, superphos. grades 22% & under; at point-of-sale” replaced with “CUTOFF phosphatic fertilizer, superphos. grades 22% and under; at point-of-sale”.Corrected flow values for “water; from plant uptake; biogenic” and “dry matter except CNPK; from plant uptake; biogenic” in some datasets.Presented data in the International Reference Life Cycle Data System (ILCD)1 format, allowing the parameterization of raw data and mathematical relations to be presented within the datasets and the inclusion of parameter uncertainty data. Note that ILCD formatted data can be converted to the ecospold v1 format using the OpenLCA software.Data quality rankings have been updated to reflect the inclusion of uncertainty data in the ILCD formatted data.Changed all parameter names to “pxxxx” to accommodate mathematical relation character limitations in OpenLCA. Also adjusted select mathematical relations to recognize zero entries. The revised list of parameter names is provided in the documentation attached.Resources in this dataset:Resource Title: Cooper-crop-production-data-parameterization-version-1.1 .File Name: Cooper-crop-production-data-parameterization-version-1.1.xlsxResource Description: Description of parameters that define the Cooper Unit process data for field crop production version 1.1Resource Title: Cooper_Crop_Data_v1.1_ILCD.File Name: Cooper_Crop_Data_v1.1_ILCD.zipResource Description: .zip archive of ILCD xml files that comprise crop production unit process modelsResource Software Recommended: openLCA,url: http://www.openlca.org/Resource Title: Summary of Revisions of the LCA Digital Commons Unit Process Data: field crop production for version 1.1 (August 2013).File Name: Summary of Revisions of the LCA Digital Commons Unit Process Data- field crop production, Version 1.1 (August 2013).pdfResource Description: Documentation of revisions to version 1 data that constitute version 1.1

  11. f

    Definition of variables in the model.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Chao; Li, Xiahui; Li, Shuai; Wang, Zhe; Zhu, Xianming; Long, Haonan (2024). Definition of variables in the model. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001450239
    Explore at:
    Dataset updated
    Jan 25, 2024
    Authors
    Zhang, Chao; Li, Xiahui; Li, Shuai; Wang, Zhe; Zhu, Xianming; Long, Haonan
    Description

    An ultrasonic phased array defect extraction method based on adaptive region growth is proposed, aiming at problems such as difficulty in defect identification and extraction caused by noise interference and complex structure of the detected object during ultrasonic phased array detection. First, bilateral filtering and grayscale processing techniques are employed for the purpose of noise reduction and initial data processing. Following this, the maximum sound pressure within the designated focusing region serves as the seed point. An adaptive region iteration method is subsequently employed to execute automatic threshold capture and region growth. In addition, mathematical morphology is applied to extract the processed defect features. In the final stage, two sets of B-scan images depicting hole defects of varying sizes are utilized for experimental validation of the proposed algorithm’s effectiveness and applicability. The defect features extracted through this algorithm are then compared and analyzed alongside the histogram threshold method, Otsu method, K-means clustering algorithm, and a modified iterative method. The results reveal that the margin of error between the measured results and the actual defect sizes is less than 13%, representing a significant enhancement in the precision of defect feature extraction. Consequently, this method establishes a dependable foundation of data for subsequent tasks, such as defect localization and quantitative and qualitative analysis.

  12. f

    Definitions of mathematical notation used in this paper.

    • figshare.com
    xls
    Updated Oct 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen (2025). Definitions of mathematical notation used in this paper. [Dataset]. http://doi.org/10.1371/journal.pone.0335221.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definitions of mathematical notation used in this paper.

  13. d

    Replication Data for: Optimal control in opinion dynamics models: diversity...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kozitsin, Ivan (2024). Replication Data for: Optimal control in opinion dynamics models: diversity of influence mechanisms and complex influence hierarchies [Dataset]. http://doi.org/10.7910/DVN/6D6OGG
    Explore at:
    Dataset updated
    Jan 29, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kozitsin, Ivan
    Description

    The dataset includes replication materials (dataset files, python code, and the file "OSM.pdf" (Online Supplementary Materials text file + manual)) for the article entitled "Optimal control in opinion dynamics models: diversity of influence mechanisms and complex influence hierarchies."

  14. Z

    Band Ratio Mosaics from Airborne Hyperspectral Data at Aramo, Spain

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De La Rosa Fernandez, Roberto Alejandro (2024). Band Ratio Mosaics from Airborne Hyperspectral Data at Aramo, Spain [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14193285
    Explore at:
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Beak Consultants GmbH
    Authors
    De La Rosa Fernandez, Roberto Alejandro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metadata information

    Full Title Band Ratio Mosaics from Airborne Hyperspectral Data at Aramo, Spain

    Abstract

    This dataset comprises results from the S34I Project, derived from processing airborne hyperspectral data acquired at the Aramo pilot site in Spain. Spectral Mapping Services (SMAPS Oy) conducted the airborne data acquisition in May 2024 using the Specim AisaFENIX sensor (covering VNIR-SWIR spectral ranges) over 17 flight lines. SMAPS performed geometric correction, radiometric calibration to reflectance, and atmospheric correction of the data. Subsequent processing steps included spectral smoothing with a Savitzky-Golay filter, cloud masking, bad pixel corrections, and hull correction (continuum removal).

    Manual processing and interpretation of hyperspectral data is a challenging, time-consuming, and subjective task, necessitating automated or semi-automated approaches. Therefore, we present a semi-automated workflow for large-scale interpretation of hyperspectral data, based on a combination of state-of-the-art methodologies. This dataset results from the calculation of a series of band ratios applied to the images and their subsequent mosaicking into a TIFF file. The mosaics are delivered as georeferenced TIFF files that cover approximately 97 km² with a spatial resolution of 1.2 m per pixel. The NoData value is set to -9999, representing areas of cloud removal or missing flight lines. The projected coordinate system is UTM Zone 30 Northern Hemisphere WGS 1984, EPSG 4326.

    Hyperspectral band ratios involve applying mathematical operations (such as division, subtraction, addition, or multiplication) among the reflectance values of different spectral bands. This technique enhances subtle variations in how materials absorb and reflect light across the electromagnetic spectrum. These variations are caused by electronic transitions, vibrations of chemical bonds (including -OH, Si-O, Al-O, and others), and lattice vibrations within the material's crystal structure.

    By creating these mathematical combinations, specific absorption features are emphasized, generating unique spectral fingerprints for different materials. However, these fingerprints alone cannot definitively identify a mineral, as different minerals may share similar absorption features due to common chemical bonds or crystal structures. Spectral geologists use band ratios as a tool to highlight potential areas of interest, but they must integrate this information with other geological knowledge and analyses to accurately interpret the mineralogy of an area.

    This dataset includes nine spectral band ratios. The mathematical formulas used to calculate each ratio are provided below:

    BR1 target Carbonate / Chlorite / Epidote

    BR1 = ((C7 + C9) / (C8))

    C7= Mean of bands between 2246.6 and 2257.55 nm

    C8= Mean of bands between 2339 and 2345 nm

    C9= Mean of bands between 2400 and 2410 nm

    BR2 target Chlorite

    BR2 = ((Cl1 + Cl2) / (Cl2))

    Cl1 = Mean of bands between 2191.93 and 2197.4 nm

    Cl2 = Mean of bands between 2246.63 and 2257.55 nm

    BR3 target Clay

    BR3 = ((C1 + C2) / (C2))

    C1 = Mean of bands between 1590.32 and 1612.56 nm

    C2 = Mean of bands between 2191.93 and 2208.35 nm

    BR4 target Dolomite

    BR4 = ((C6 + C8) / (C7))

    C6= Mean of bands between 2186 and 2191 nm

    C7= Mean of bands between 2246.6 and 2257.55 nm

    C8= Mean of bands between 2339 and 2345 nm

    BR5 target Fe2

    BR5 = ((Fe2n + Fe2d) / (Fe2d))

    Fe2n = Mean of bands between 721.85 and 742.48 nm

    BR6 target Fe3

    BR6 = ((Fe3n - Fe3d) / (Fe3n + Fe3d))

    Fe3n = Mean of bands between 776.87 to 811.26 nm

    Fe3d = = Mean of 3 bands around 610 nm

    BR7 target = Kaolinite / clays

    BR7 = ((K1 + K2) / (K3 + K4))

    K1 = Mean of bands between 2082.27 and 2104.23 nm

    K2 = Mean of bands between 2104.23 and 2115.2 nm

    K3 = Mean of bands between 2159.07 and 2164.55 nm

    K4 = Mean of bands between 2202.88 and 2208.35 nm

    BR8 target Kaolinite2 / clays

    BR8 = ((K1_2 + K2_2) / (K2_2))

    K1_2 = Mean of bands between 2197.4 and 2219.29 nm

    K2_2 = Mean of bands between 2159.07 and 2170.03 nm

    BR9 target NDVI (Normalized Difference Vegetation Index)

    BR9 = ((NIR - Red) / (NIR + Red))

    NIR= Mean of bands between 776.87 and 811.26 nm

    Red = Mean of bands between 666.87 to 680.6 nm

    Keywords Earth Observation, Remote Sensing, Hyperspestral Imaging, Automated Processing, Hyperspectral Data Processing, Mineral Exploration, Critical Raw Materials

    Pilot area Aramo

    Language

    English

    URL Zenodo https://zenodo.org/uploads/14193286

    Temporal reference

    Acquisition date (dd.mm.yyyy) 01.05.2024

    Upload date (dd.mm.yyyy) 20.11.2024

    Quality and validity

    Fromat GeoTiff

    Spatial resolution 1.2m

    Positional accuracy 0.5m

    Coordinate system EPGS 4326

    Access and use constrains

    Use limitation None

    Access constraint None

    Public/Private Public

    Responsible organisation

    Responsible Party Beak Consultants GmbH

    Responsible Contact Roberto De La Rosa

    Metadata on metadata

    Contact Roberto.delarosa@beak.de

    Metadata language English

  15. Z

    augMENTOR: Simulated Student Learning Profiles and their Engagement Metrics...

    • data.niaid.nih.gov
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kostakos, Panos (2023). augMENTOR: Simulated Student Learning Profiles and their Engagement Metrics in TryHackMe Platform_V1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10070024
    Explore at:
    Dataset updated
    Nov 3, 2023
    Authors
    Kostakos, Panos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides simulated insights into student engagement and performance within the THM platform. It outlines mathematical representations of student learning profiles, detailing behaviors ranging from high achievers to inconsistent performers. Additionally, the dataset includes key performance indicators, offering metrics like room completion, points earned, and time spent to gauge student progress and interaction within the platform's modules. Here are definitions of the learning profiles, along with mathematical representations of their behaviors:

    High Achiever: These are students who consistently perform well across all modules. Their performance can be described as a normal distribution centered at a high mean value. Their performance P in a given module can be modelled as: P = N(90, 5) where N is the normal distribution function, 90 is the mean, and 5 is the standard deviation. Average Performer: These are students who typically perform at the average level across all modules. Their performance can be described as a normal distribution centered at a medium mean value: P = N(70, 10), where 70 is the mean, and 10 is the standard deviation. Late Bloomer: These are students whose performance improves as they progress through the modules. Their performance can be modelled as: P = N(50 + i*10, 10), where i is the module index and shows an increasing trend. Specialized Talent: These are students who have average performance in most modules but excel in a particular module (e.g., module5). Their performance can be described as: P = N(90, 5) if the module is module 5, else P = N(70, 10). Inconsistent Performer: These are students whose performance varies significantly across modules. Their performance can be described as a normal distribution with a high standard deviation: P = N(70, 30), where 70 is the mean, and 30 is the high standard deviation, reflecting inconsistency. Note that the actual performances are bounded between 0 and 100 using the function max(0, min(100, performance)) to ensure valid percentages. In these formulas, the np.random.normal function is used to simulate the variability in student performance around the mean values. The first argument to this function is the mean, and the second argument is the standard deviation, reflecting the level of variability around the mean. The function returns a number drawn from the normal distribution described by these parameters. Note that the proposed method is experimental and has not been validated.

    List of Key Performance Indicators (KPIs) for Student Engagement and Progress within the Platform:

    Room Name: This represents the unique identifier or name of a specific room (or module). Think of each room as a separate module or lesson within an educational platform. For example, Room1, Room2, etc. Total rooms completed: Indicates the cumulative number of rooms that a student has fully completed. Completion is typically determined by meeting certain criteria, like answering all questions or achieving a certain score. Rooms registered in: Represents the number of rooms a student has registered or enrolled in. This could be different from the total number of rooms they've completed. Ratio of Questions completed per room: This gives an insight into a student's progress in a particular room. For instance, a ratio of 7/10 suggests the student has completed 7 out of 10 available questions in that room. Room Completed (yes no): Indicates whether a student has fully completed a specific room or not. This could be determined by the percentage of material covered, questions answered, or a certain score achieved. Room Last deploy (count of days): Refers to the number of days since the last update or deployment was made to that room. It can give an idea about the effort of the student. Points in room used for the leaderboard (range 0-560): Each room assigns points based on student performance, and these points contribute to leaderboards. The range suggests that a student can earn anywhere from 0 to 560 points in a particular room. Last answered question in a room (27th Jan 2023): This indicates the date when a student last answered a question in a specific room. It can provide insights into a student's recent activity and engagement. Total points in all rooms (range 0-560): The cumulative score a student has achieved across all rooms. Path Percentage completed (range 0-100): Indicates the percentage of the overall learning path that the student has completed. A path could consist of multiple modules or rooms. Module Percentage completed (range 0-100): Represents how much of a specific module (which could have multiple lessons or topics) a student has completed. Room Percentage completed (range 0-100): Shows the percentage of a specific room that has been completed by a student. Time Spent on the platform (seconds): This provides an aggregate of the total time a student has spent on the entire educational platform. Time spent on each room (seconds): Represents the amount of time a student has dedicated to a specific room. This can give insights into which rooms or modules are the most time-consuming or engaging for students.

  16. HASYv2 - Symbol Recognizer

    • kaggle.com
    zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fedesoriano (2021). HASYv2 - Symbol Recognizer [Dataset]. https://www.kaggle.com/fedesoriano/hasyv2-symbol-recognizer
    Explore at:
    zip(85506565 bytes)Available download formats
    Dataset updated
    Oct 11, 2021
    Authors
    fedesoriano
    Description

    Context

    Publicly available datasets have helped the computer vision community to compare new algorithms and develop applications. Especially MNIST [LBBH98] was used thousands of times to train and evaluate models for classification. However, even rather simple models consistently get about 99.2 % accuracy on MNIST [TF-16a]. The best models classify everything except for about 20 instances correct. This makes meaningful statements about improvements in classifiers hard. A possible reason why current models are so good on MNIST are 1) MNIST has only 10 classes 2) there are very few (probably none) labelling errors in MNIST 3) every class has 6000 training samples 4) the feature dimensionality is comparatively low. Also, applications that need to recognize only Arabic numerals are rare. Similar to MNIST, HASY is of very low resolution. In contrast to MNIST, the HASYv2 dataset contains 369 classes, including Arabic numerals and Latin characters. Furthermore, HASYv2 has much fewer recordings per class than MNIST and is only in black and white whereas MNIST is in grayscale. HASY could be used to train models for semantic segmentation of non-cursive handwritten documents like mathematical notes or forms.

    Content

    The dataset contains the following:

    • a pickle file: HASYv2
    • a txt file: cite.txt

    The pickle file contains the 168233 observations in a dictionary form. The simplest way to use the HASYv2 dataset is to download the pickle file below (HASYv2). You can use the following lines of code to load the data:

    def unpickle(file):
      import pickle
      with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
      return dict
    

    HASYv2 = unpickle("HASYv2")

    The data comes in a dictionary format, you can get the data and the labels separately by extracting the content from the dictionary: data = HASYv2['data'] labels = HASYv2['labels'] symbols = HASYv2['latex_symbol'] Note that the shape of the data is directly (32 x 32 x 3 x 168233), with the first and second dimensions as the height and width respectively, the third dimension correspond to the channels and the fourth to the observation number.

    Citation

    fedesoriano. (October 2021). HASYv2 - Symbol Recognizer. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/hasyv2-symbol-recognizer.

    Source

    The dataset was originally uploaded by Martin Thoma, see https://arxiv.org/abs/1701.08380.

    Thoma, M. (2017). The HASYv2 dataset. ArXiv, abs/1701.08380.

    The original paper describes the HASYv2 dataset. HASY is a publicly available, free of charge dataset of single symbols similar to MNIST. It contains 168233 instances of 369 classes. HASY contains two challenges: A classification challenge with 10 pre-defined folds for 10-fold cross-validation and a verification challenge. The HASYv2 dataset (PDF Download Available). Available from: https://arxiv.org/pdf/1701.08380.pdf [accessed Oct 11, 2021].

  17. Difference in mean accuracy of classifiers.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Raphael Amancio; Cesar Henrique Comin; Dalcimar Casanova; Gonzalo Travieso; Odemir Martinez Bruno; Francisco Aparecido Rodrigues; Luciano da Fontoura Costa (2023). Difference in mean accuracy of classifiers. [Dataset]. http://doi.org/10.1371/journal.pone.0094137.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Diego Raphael Amancio; Cesar Henrique Comin; Dalcimar Casanova; Gonzalo Travieso; Odemir Martinez Bruno; Francisco Aparecido Rodrigues; Luciano da Fontoura Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean difference between the accuracy of classifier in row and the classifier in column . The last column shows the mean accuracy of the respective classifier for all datasets considered in our study.

  18. Data from: S1 Dataset -

    • figshare.com
    bin
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Asghar Ali Shah; Tariqullah Jan; Syed Muslim Shah; Muhammad Asif Zahoor Raja; Mohammad Haseeb Zafar; Sana Ul Haq (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0304018.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Syed Asghar Ali Shah; Tariqullah Jan; Syed Muslim Shah; Muhammad Asif Zahoor Raja; Mohammad Haseeb Zafar; Sana Ul Haq
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fractional order algorithms demonstrate superior efficacy in signal processing while retaining the same level of implementation simplicity as traditional algorithms. The self-adjusting dual-stage fractional order least mean square algorithm, denoted as LFLMS, is developed to expedite convergence, improve precision, and incurring only a slight increase in computational complexity. The initial segment employs the least mean square (LMS), succeeded by the fractional LMS (FLMS) approach in the subsequent stage. The latter multiplies the LMS output, with a replica of the steering vector (Ŕ) of the intended signal. Mathematical convergence analysis and the mathematical derivation of the proposed approach are provided. Its weight adjustment integrates the conventional integer ordered gradient with a fractional-ordered. Its effectiveness is gauged through the minimization of mean square error (MSE), and thorough comparisons with alternative methods are conducted across various parameters in simulations. Simulation results underscore the superior performance of LFLMS. Notably, the convergence rate of LFLMS surpasses that of LMS by 59%, accompanied by a 49% improvement in MSE relative to LMS. So it is concluded that the LFLMS approach is a suitable choice for next generation wireless networks, including Internet of Things, 6G, radars and satellite communication.

  19. Exploring Student Achievement Trends

    • kaggle.com
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saad Ali Yaseen (2025). Exploring Student Achievement Trends [Dataset]. https://www.kaggle.com/datasets/saadaliyaseen/exploring-student-achievement-trends
    Explore at:
    zip(8907 bytes)Available download formats
    Dataset updated
    Sep 29, 2025
    Authors
    Saad Ali Yaseen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    This dataset contains 1000 rows and 8 columns, representing students’ population details and academic performance. The columns include Components like gender, race, Guardian level of education, lunch type, and test preparation course, along with their math, reading, and writing scores. It provides a detailed view of how background and preparation affect student performance.

    Feature Distribution:

    Gender: 518 females, 482 males

    Race/Ethnicity: Group C (319), Group D (262), Group B (190), Group E (140), Group A (89)

    Parental Education: Some college (226), Associate’s degree (222), High school (196), Some high school (179), Bachelor’s (118), Master’s (59)

    Lunch: Standard (645), Free/Reduced (355)

    Test Preparation: None (642), Completed (358)

    Math Score: Min 0, Max 100, Mean ≈ 66.1, Median 66

    Reading Score: Min 17, Max 100, Mean ≈ 69.2, Median 70

    Writing Score: Min 10, Max 100, Mean ≈ 68.1, Median 69

  20. f

    Data from: Asphaltene Precipitation Prediction during Bitumen Recovery:...

    • acs.figshare.com
    zip
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turar Yerkenov; Simin Tazikeh; Afshin Tatar; Ali Shafiei (2023). Asphaltene Precipitation Prediction during Bitumen Recovery: Experimental Approach versus Population Balance and Connectionist Models [Dataset]. http://doi.org/10.1021/acsomega.2c03249.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    ACS Publications
    Authors
    Turar Yerkenov; Simin Tazikeh; Afshin Tatar; Ali Shafiei
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Deasphalting bitumen using paraffinic solvent injection is a commonly used technique to reduce both its viscosity and density and ease its flow through pipelines. Common modeling approaches for asphaltene precipitation prediction such as population balance model (PBM) contains complex mathematical relation and require conducting precise experiments to define initial and boundary conditions. Machine learning (ML) approach is considered as a robust, fast, and reliable alternative modeling approach. The main objective of this research work was to model the effect of paraffinic solvent injection on the amount of asphaltene precipitation using ML and PBM approaches. Five hundred and ninety (590) experimental data were collected from the literature for model development. The gathered data was processed using box plot, data scaling, and data splitting. Data pre-processing led to the use of 517 data points for modeling. Then, multilayer perceptron, random forest, decision tree, support vector machine, committee machine intelligent system optimized by annealing, and random search techniques were used for modeling. Precipitant molecular weight, injection rate, API gravity, pressure, C5 asphaltene content, and temperature were determined as the most relevant features for the process. Although the results of the PBM model are precise, the AI/ML model (CMIS) is the preferred model due to its robustness, reliability, and relative accuracy. The committee machine intelligent system is the superior model among the developed smart models with an RMSE of 1.7% for the testing dataset and prediction of asphaltene precipitation during bitumen recovery.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b
Organization logo

MetaMath QA

Mathematical Questions for Large Language Models

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(78629842 bytes)Available download formats
Dataset updated
Nov 23, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

  • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
  • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
  • Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

Search
Clear search
Close search
Google apps
Main menu