15 datasets found
  1. MetaMath QA

    • kaggle.com
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚹 Your notebook can be here! 🚹!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  2. Data from: The IBEM Dataset: a large printed scientific image dataset for...

    • zenodo.org
    zip
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Anitei; Dan Anitei; Joan Andreu Sånchez; Joan Andreu Sånchez; José Miguel Benedí; José Miguel Benedí (2023). The IBEM Dataset: a large printed scientific image dataset for indexing and searching mathematical expressions [Dataset]. http://doi.org/10.5281/zenodo.7963703
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dan Anitei; Dan Anitei; Joan Andreu Sånchez; Joan Andreu Sånchez; José Miguel Benedí; José Miguel Benedí
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The IBEM dataset consists of 600 documents with a total number of 8272 pages, containing 29603 isolated and 137089 embedded Mathematical Expressions (MEs). The objective of the IBEM dataset is to facilitate the indexing and searching of MEs in massive collections of STEM documents. The dataset was built by parsing the LaTeX source files of documents from the KDD Cup Collection. Several experiments can be carried out with the IBEM dataset ground-truth (GT): ME detection and extraction, ME recognition, etc.

    The dataset consists of the following files:

    • “IBEM.json”: file containing the IBEM GT information. The data is firstly organized by pages, then by the type of expression (“embedded” or “displayed”), and lastly by the GT of each individual ME. For each ME we provide:
      • xy page-level coordinates, reported as relative (%) to the width/height of the page image.
      • “split” attribute indicating the number of fragments in which the ME has been split. MEs can be split over various lines, columns or pages. The LaTeX transcript of split MEs have been exactly replicated (entire LaTeX definition) for each fragment.
      • “latex” original transcript as extracted from the LaTeX source files of the documents. This definition can contain user-defined macros. In order to be able to compile these expressions, each page includes the preamble of the source files containing the defined macros and the packages used by the authors of the documents.
      • “latex_expand” transcript reconstructed from the output stream of the LuaLaTeX engine in which user-defined macros have been expanded. The transcript has the same visual representation as the original transcript, with the addition that the LaTeX definitions are tokenized, the order of sub/super script elements have been fixed, and matrices have been transformed to arrays.
      • “latex_norm” transcript resulting from applying an extra normalization process to the “latex_expand” expression. This normalization process includes removing font information such as slant, style, and weight.
    • “partitions/*.lst”: files containing list of pages forming the partition sets.
    • “pages/*.jpg”: individual pages extracted from the documents.

    The dataset is partitioned into various sets as provided for the ICDAR 2021 Competition on Mathematical Formula Detection. The ground-truth related to this competition, which is included in this dataset version, can also be found here. More information about the competition can be found in the following paper:

    D. Anitei, J.A. Sánchez, J.M. Fuentes, R. Paredes, and J.M. Benedí. ICDAR 2021 Competition on Mathematical Formula Detection. In ICDAR, pages 783–795, 2021.

    For ME recognition tasks, we recommend rendering the “latex_expand” version of the formulae in order to create standalone expressions that have the same visual representation as MEs found in the original documents (see attached python script “extract_GT.py”). Extracting MEs from the documents based on coordinates is more complex, as special care is needed to concatenate the fragments of split expressions. Baseline results for ME recognition tasks will soon be made available.

  3. D

    Comparative Judgement of Statements About Mathematical Definitions

    • dataverse.no
    • dataverse.azure.uit.no
    csv, txt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad (2023). Comparative Judgement of Statements About Mathematical Definitions [Dataset]. http://doi.org/10.18710/EOZKTR
    Explore at:
    txt(3623), csv(2523), csv(37503), csv(43566)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Tore Forbregd; Tore Forbregd; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad; Hermund Torkildsen; Eivind Kaspersen; Trygve Solstad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data from a comparative judgement survey consisting of 62 working mathematics educators (ME) at Norwegian universities or city colleges, and 57 working mathematicians at Norwegian universities. A total of 3607 comparisons of which 1780 comparisons by the ME and 1827 ME. The comparative judgement survey consisted of respondents comparing pairs of statements on mathematical definitions compiled from a literature review on mathematical definitions in the mathematics education literature. Each WM was asked to judge 40 pairs of statements with the following question: “As a researcher in mathematics, where your target group is other mathematicians, what is more important about mathematical definitions?” Each ME was asked to judge 41 pairs of statements with the following question: “For a mathematical definition in the context of teaching and learning, what is more important?” The comparative judgement was done with No More Marking software (nomoremarking.com) The data set consists of the following data: comparisons made by ME (ME.csv) comparisons made by WM (WM.csv) Look up table of codes of statements and statement formulations (key.csv) Each line in the comparison represents a comparison, where the "winner" column represents the winner and the "loser" column the loser of the comparison.

  4. P

    NaturalProofs Dataset

    • paperswithcode.com
    • opendatalab.com
    • +2more
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho (2025). NaturalProofs Dataset [Dataset]. https://paperswithcode.com/dataset/naturalproofs
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Sean Welleck; Jiacheng Liu; Ronan Le Bras; Hannaneh Hajishirzi; Yejin Choi; Kyunghyun Cho
    Description

    The NaturalProofs Dataset is a large-scale dataset for studying mathematical reasoning in natural language. NaturalProofs consists of roughly 20,000 theorem statements and proofs, 12,500 definitions, and 1,000 additional pages (e.g. axioms, corollaries) derived from ProofWiki, an online compendium of mathematical proofs written by a community of contributors.

  5. H

    Replication Data for: Mean Temperature of Yaounde (Cameroon) from 1976 to...

    • dataverse.harvard.edu
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Le Doux Mbele Bidima (2024). Replication Data for: Mean Temperature of Yaounde (Cameroon) from 1976 to 2021 [Dataset]. http://doi.org/10.7910/DVN/WQHCZV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Martin Le Doux Mbele Bidima
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Cameroon, Yaoundé
    Description

    The dataset contains mean temperature of the city of Yaounde in Cameroon from 1976 to 2021.

  6. G

    Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading,...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading, science and mathematics assessment [Dataset]. https://open.canada.ca/data/dataset/051c952c-0f58-4da6-b597-d8b0821d7be7
    Explore at:
    xml, csv, htmlAvailable download formats
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Reading, science and math mean scores from the Pan-Canadian Assessment Program (PCAP), by province.

  7. u

    Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading,...

    • data.urbandatacentre.ca
    • beta.data.urbandatacentre.ca
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mean scores of Grade 8 students, Pan-Canadian Assessment Program reading, science and mathematics assessment - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-051c952c-0f58-4da6-b597-d8b0821d7be7
    Explore at:
    Dataset updated
    Oct 1, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Reading, science and math mean scores from the Pan-Canadian Assessment Program (PCAP), by province.

  8. StudentMathScores

    • kaggle.com
    Updated Jun 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Logan Henslee (2019). StudentMathScores [Dataset]. https://www.kaggle.com/loganhenslee/studentmathscores/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Logan Henslee
    Description

    CONTEXT

    Practice Scenario: The UIW School of Engineering wants to recruit more students into their program. They will recruit students with great math scores. Also, to increase the chances of recruitment,​ the department will look for students who qualify for financial aid. Students who qualify for financial aid more than likely come from low socio-economic backgrounds. One way to indicate this is to view how much federal revenue a school district receives through its state. High federal revenue for a school indicates that a large portion of the student base comes from low incomes families.

    The question we wish to ask is as follows: Name the school districts across the nation where their Child Nutrition Programs(c25) are federally funded between the amounts $30,000 and $50,000. And where the average math score for the school districts corresponding state is greater than or equal to the nations average score of 282.

    The SQL query below in 'Top5MathTarget.sql' can be used to answer this question in MySQL. To execute this process, one would need to install MySQL to their local system and load the attached datasets below from Kaggle into their MySQL schema. The SQL query below will then join the separate tables on various key identifiers.

    DATA SOURCE Data is sourced from The U.S Census Bureau and The Nations Report Card (using the NAEP Data Explorer).

    Finance: https://www.census.gov/programs-surveys/school-finances/data/tables.html

    Math Scores: https://www.nationsreportcard.gov/ndecore/xplore/NDE

    COLUMN NOTES

    All data comes from the school year 2017. Individual schools are not represented, only school districts within each state.

    FEDERAL FINANCE DATA DEFINITIONS

    t_fed_rev: Total federal revenue through the state to each school district.

    C14- Federal revenue through the state- Title 1 (no child left behind act).

    C25- Federal revenue through the state- Child Nutrition Act.

    Title 1 is a program implemented in schools to help raise academic achievement ​for all students. The program is available to schools where at least 40% of the students come from low inccom​​e families.

    Child Nutrition Programs ensure the children are getting the food they need to grow and learn. Schools with high federal revenue to these programs indicate students that also come from low income​ families.

    MATH SCORES DATA DEFINITIONS

    Note: Mathematics, Grade 8, 2017, All Students (Total)

    average_scale_score - The state's average score for eighth graders taking the NAEP math exam.

  9. P

    MML Dataset

    • paperswithcode.com
    Updated Jan 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt (2025). MML Dataset [Dataset]. https://paperswithcode.com/dataset/mmlu
    Explore at:
    Dataset updated
    Jan 5, 2025
    Authors
    Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt
    Description

    MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.

  10. Difference in mean accuracy of classifiers.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Raphael Amancio; Cesar Henrique Comin; Dalcimar Casanova; Gonzalo Travieso; Odemir Martinez Bruno; Francisco Aparecido Rodrigues; Luciano da Fontoura Costa (2023). Difference in mean accuracy of classifiers. [Dataset]. http://doi.org/10.1371/journal.pone.0094137.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Diego Raphael Amancio; Cesar Henrique Comin; Dalcimar Casanova; Gonzalo Travieso; Odemir Martinez Bruno; Francisco Aparecido Rodrigues; Luciano da Fontoura Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean difference between the accuracy of classifier in row and the classifier in column . The last column shows the mean accuracy of the respective classifier for all datasets considered in our study.

  11. d

    Data from: FRAGSTATS DATABASE: Showcasing relationships between neighborhood...

    • search.dataone.org
    • borealisdata.ca
    • +1more
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaker, Richard (2023). FRAGSTATS DATABASE: Showcasing relationships between neighborhood design and Wellbeing Toronto indicators [Dataset]. http://doi.org/10.5683/SP2/BNARSZ
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Shaker, Richard
    Description

    A research study across the 140 neighborhood‐landscapes (streetscapes) of Toronto was presented through three main intentions. Its foundational goal was to calculate landscape ecology metrics from the 2007 land cover dataset for the City of Toronto; for use in sustainable development planning strategies and to bolster its Wellbeing Toronto data dashboard. In doing so, 130 landscape ecology metrics were computed to serve as a foundational suite for the City of Toronto: 18 class configuration metrics across seven of the City’s eight land cover categories and four landscape diversity metrics. Metrics for agriculture were not included due to very limited neighborhood representation. The 18 class configuration metrics computed for each of the seven land cover types were: class area (CA), percentage of landscape (PLAND), patch density (PD), largest patch index (LPI), landscape shape index (LSI), mean patch area (AREA_MN), area-weighted mean patch area (AREA_AM), area‐weighted mean shape index (SHAPE_AM), area‐weighted mean patch fractal dimension (FRAC_AM), perimeter‐area fractal dimension (PAFRAC), area‐weighted core area distribution (CORE_AM), area‐weighted core area index (CAI_AM), area‐weighted mean Euclidean nearest neighbor distance (ENN_AM), clumpiness index (CLUMPY), percentage‐of‐like‐adjacency (PLADJ), patch cohesion index (COHESION), landscape division index (DIVISION), and effective mesh size (MESH). Additionally, the four landscape diversity metrics were: Patch richness density (PRD), Relative patch richness (RPR), Shannon’s diversity index (SHDI), and Shannon’s evenness index (SHEI). Note that other relationships await discovery using this free database; thus, forthcoming germane research should consider its adoption. The landscape ecology database is provided here via GIS shapefile format and can be used freely with citation.

  12. p

    CNRS/Univ Pau & Pays Adour, Laboratoire de Mathématiques et de leurs...

    • pigma.org
    • seanoe.org
    • +2more
    rel-canonical +2
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CNRS/Univ Pau & Pays Adour, Laboratoire de Mathématiques et de leurs Applications de Pau - Fédération MIRA, UMR5142, 64600 Anglet, France ARC Centre of Excellence for Mathematical and Statistical Frontiers at School of Mathematical Science, QueenslandUniversity of Technology, Brisbane, Australia (2020). CNRS/Univ Pau & Pays Adour, Laboratoire de Mathématiques et de leurs Applications de Pau - Fédération MIRA, UMR5142, 64600 Anglet, France [Dataset]. https://www.pigma.org/geonetwork/srv/api/records/seanoe:77179
    Explore at:
    www:link-1.0-http--metadata-url, rel-canonical, www:download-1.0-link--downloadAvailable download formats
    Dataset updated
    Nov 24, 2020
    Dataset authored and provided by
    CNRS/Univ Pau & Pays Adour, Laboratoire de Mathématiques et de leurs Applications de Pau - Fédération MIRA, UMR5142, 64600 Anglet, France ARC Centre of Excellence for Mathematical and Statistical Frontiers at School of Mathematical Science, QueenslandUniversity of Technology, Brisbane, Australia
    Area covered
    Description
    ####### # Data description #

    This dataset have been constructed and used for scientific purpose, available in the paper "Detecting the effects of inter-annual and seasonal changes of environmental factors on the the striped red mullet population in the Bay of Biscay" authored by Kermorvant C., Caill-Milly N., Sous D., Paradinas I., Lissardy M. and Liquet B. and published in Journal of Sea Research. This file is an extraction from the SACROIS fisheries database created by Ifremer (for more information see https://sextant.ifremer.fr/record/3e177f76-96b0-42e2-8007-62210767dc07/) and from the Copernicus database. Biochemestry comes from the product GLOBAL_ANALYSIS_FORECAST_BIO_001_028 (https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=GLOBAL_ANALYSIS_FORECAST_BIO_001_028). Temperature and salinity comes from GLOBAL_ANALYSIS_FORECAST_PHY_001_024 product (https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=GLOBAL_ANALYSIS_FORECAST_PHY_001_024). As fisheries landing per unit of effort is only available per ICES rectangle and by month, environmental data have been aggregated accordingly.

    ######### # Colomns description # ############### rectangle - The 6 ICES statistical rectangles used in the study. time_m - Time in months, from the beginning to the end of the study. annee = year mois = month (from 1 to 12) Poids = Weight of red mullet landed valeur = Temps_peche = fishing time Nb_sequence = number of fishing sequences Moy / Med / Var / StD Quartil_1 / Quartil_3 / min / max / CV / IQR = statistical descriptors of landing by rectangle and by month log_cpue = log of Med colomn mean_surface_s = mean of surface salinity by month and by rectangle median_surface_s = median of surface salinity by month and by rectangle mean_surface_t = mean of surface temperature by month and by rectangle median_surface_t = median of surface temperature by month and by rectangle si / zeu /po4 / pyc / o2/ nppv / no3 and nh4 mean and median concentration by rectangle and by month pc3 / pc2 / pc1 - projections of previous biochemestry variables on the three first axes of a PCA
  13. f

    Data from: S1 Dataset -

    • figshare.com
    bin
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Asghar Ali Shah; Tariqullah Jan; Syed Muslim Shah; Muhammad Asif Zahoor Raja; Mohammad Haseeb Zafar; Sana Ul Haq (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0304018.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Syed Asghar Ali Shah; Tariqullah Jan; Syed Muslim Shah; Muhammad Asif Zahoor Raja; Mohammad Haseeb Zafar; Sana Ul Haq
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fractional order algorithms demonstrate superior efficacy in signal processing while retaining the same level of implementation simplicity as traditional algorithms. The self-adjusting dual-stage fractional order least mean square algorithm, denoted as LFLMS, is developed to expedite convergence, improve precision, and incurring only a slight increase in computational complexity. The initial segment employs the least mean square (LMS), succeeded by the fractional LMS (FLMS) approach in the subsequent stage. The latter multiplies the LMS output, with a replica of the steering vector (Ɣ) of the intended signal. Mathematical convergence analysis and the mathematical derivation of the proposed approach are provided. Its weight adjustment integrates the conventional integer ordered gradient with a fractional-ordered. Its effectiveness is gauged through the minimization of mean square error (MSE), and thorough comparisons with alternative methods are conducted across various parameters in simulations. Simulation results underscore the superior performance of LFLMS. Notably, the convergence rate of LFLMS surpasses that of LMS by 59%, accompanied by a 49% improvement in MSE relative to LMS. So it is concluded that the LFLMS approach is a suitable choice for next generation wireless networks, including Internet of Things, 6G, radars and satellite communication.

  14. Predicting Seabed Mud Content across the Australian Margin: Comparison of...

    • data.wu.ac.at
    • datadiscoverystudio.org
    • +1more
    pdf
    Updated Jun 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoscience Australia (2017). Predicting Seabed Mud Content across the Australian Margin: Comparison of Statistical and Mathematical Techniques Using a Simulation Experiment [Dataset]. https://data.wu.ac.at/schema/data_gov_au/MGYzNjdiZjQtZDAzYi00Y2EyLWI2N2ItMGFjMTA0MzZmM2Yx
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 24, 2017
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    99a8c185f0b0e060877b9dd41a8b869d33e21456
    Description

    In this study, we conducted a simulation experiment to identify robust spatial interpolation methods using samples of seabed mud content in the Geoscience Australian Marine Samples database. Due to data noise associated with the samples, criteria are developed and applied for data quality control. Five factors that affect the accuracy of spatial interpolation were considered: 1) regions; 2) statistical methods; 3) sample densities; 4) searching neighbourhoods; and 5) sample stratification. Bathymetry, distance-to-coast and slope were used as secondary variables. Ten-fold cross-validation was used to assess the prediction accuracy measured using mean absolute error, root mean square error, relative mean absolute error (RMAE) and relative root mean square error. The effects of these factors on the prediction accuracy were analysed using generalised linear models. The prediction accuracy depends on the methods, sample density, sample stratification, search window size, data variation and the study region. No single method performed always superior in all scenarios. Three sub-methods were more accurate than the control (inverse distance squared) in the north and northeast regions respectively; and 12 sub-methods in the southwest region. A combined method, random forest and ordinary kriging (RKrf), is the most robust method based on the accuracy and the visual examination of prediction maps. This method is novel, with a relative mean absolute error (RMAE) up to 17% less than that of the control. The RMAE of the best method is 15% lower in two regions and 30% lower in the remaining region than that of the best methods in the previously published studies, further highlighting the robustness of the methods developed. The outcomes of this study can be applied to the modelling of a wide range of physical properties for improved marine biodiversity prediction. The limitations of this study are discussed. A number of suggestions are provided for further studies.

    You can also purchase hard copies of Geoscience Australia data and other products at http://www.ga.gov.au/products-services/how-to-order-products/sales-centre.html

  15. f

    Multivariate model prediction accuracy on the test dataset (RMSE mean and...

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohitash Chandra; Ayush Jain; Divyanshu Singh Chauhan (2023). Multivariate model prediction accuracy on the test dataset (RMSE mean and standard deviation for 30 experimental runs across 4 prediction horizons). [Dataset]. http://doi.org/10.1371/journal.pone.0262708.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Rohitash Chandra; Ayush Jain; Divyanshu Singh Chauhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multivariate model prediction accuracy on the test dataset (RMSE mean and standard deviation for 30 experimental runs across 4 prediction horizons).

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
Organization logo

MetaMath QA

Mathematical Questions for Large Language Models

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚹 Your notebook can be here! 🚹!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

  • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
  • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
  • Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

Search
Clear search
Close search
Google apps
Main menu