79 datasets found

T
math_dataset
tensorflow.org
huggingface.co
Updated Jan 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). math_dataset [Dataset]. https://www.tensorflow.org/datasets/catalog/math_dataset
Explore at:
Dataset updated
Jan 4, 2023
Description
Mathematics database.

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

Example usage:

train_examples, val_examples = tfds.load( 'math_dataset/arithmetic_mul', split=['train', 'test'], as_supervised=True)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('math_dataset', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
MathInstruct Dataset: Hybrid Math Instruction
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MathInstruct Dataset: Hybrid Math Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathinstruct-dataset-hybrid-math-instruction-tun
Explore at:
zip(60239940 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

By TIGER-Lab (From Huggingface) [source]

About this dataset

MathInstruct is a comprehensive and meticulously curated dataset specifically designed to facilitate the development and evaluation of models for math instruction tuning. This dataset consists of a total of 13 different math rationale datasets, out of which six have been exclusively curated for this project, ensuring a diverse range of instructional materials. The main objective behind creating this dataset is to provide researchers with an easily accessible and manageable resource that aids in enhancing the effectiveness and precision of math instruction.

One noteworthy feature of MathInstruct is its lightweight nature, making it highly convenient for researchers to utilize without any hassle. With carefully selected columns such as source, source, output, output, users can readily identify the origin or reference material from where the math instruction was obtained. Additionally, they can also refer to the expected output or solution corresponding to each specific math problem or exercise.

Overall, MathInstruct offers immense potential in refining hybrid math instruction by facilitating meticulous model development and rigorous evaluation processes. Researchers can leverage this diverse dataset to gain deeper insights into effective teaching methodologies while exploring innovative approaches towards enhancing mathematical learning experiences

How to use the dataset

Title: How to Use the MathInstruct Dataset for Hybrid Math Instruction Tuning

Introduction: The MathInstruct dataset is a comprehensive collection of math instruction examples, designed to assist in developing and evaluating models for math instruction tuning. This guide will provide an overview of the dataset and explain how to make effective use of it.

Understanding the Dataset Structure: The dataset consists of a file named train.csv. This CSV file contains the training data, which includes various columns such as source and output. The source column represents the source of math instruction (textbook, online resource, or teacher), while the output column represents expected output or solution to a particular math problem or exercise.

Accessing the Dataset: To access the MathInstruct dataset, you can download it from Kaggle's website. Once downloaded, you can read and manipulate the data using programming languages like Python with libraries such as pandas.

Exploring the Columns: a) Source Column: The source column provides information about where each math instruction comes from. It may include references to specific textbooks, online resources, or even teachers who provided instructional material. b) Output Column: The output column specifies what students are expected to achieve as a result of each math instruction. It contains solutions or expected outputs for different math problems or exercises.

Utilizing Source Information: By analyzing the different sources mentioned in this dataset, researchers can understand which instructional materials are more effective in teaching specific topics within mathematics. They can also identify common strategies used by teachers across multiple sources.

Analyzing Expected Outputs: Researchers can study variations in expected outputs for similar types of problems across different sources. This analysis may help identify differences in approaches across textbooks/resources and enrich our understanding of various teaching methods.

Model Development and Evaluation: Researchers can utilize this dataset to develop machine learning models that automatically assess whether a given math instruction leads to the expected output. By training models on this data, one can create automated systems that provide feedback on math problems or suggest alternative instruction sources.

Scaling the Dataset: Due to its lightweight nature, the MathInstruct dataset is easily accessible and manageable. Researchers can scale up their training data by combining it with other instructional datasets or expand it further by labeling more examples based on similar guidelines.

Conclusion: The MathInstruct dataset serves as a valuable resource for developing and evaluating models related to math instruction tuning. By analyzing the source information and expected outputs, researchers can gain insights into effective teaching methods and build automated assessment

Research Ideas

Model development: This dataset can be used for developing and training models for math instruction...
p
Trends in Math Proficiency (2012-2023): Range View Elementary School vs....
publicschoolreview.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review, Trends in Math Proficiency (2012-2023): Range View Elementary School vs. Colorado vs. Weld County Reorganized School District No. Re-4 [Dataset]. https://www.publicschoolreview.com/range-view-elementary-school-profile
Explore at:
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset tracks annual math proficiency from 2012 to 2023 for Range View Elementary School vs. Colorado and Weld County Reorganized School District No. Re-4
n
Data from: Correcting for missing and irregular data in home-range...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jan 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese (2018). Correcting for missing and irregular data in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.n42h0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.n42h0
Dataset updated
Jan 9, 2018
Dataset provided by
University of Tasmania
University of Maryland, College Park
University of Massachusetts Amherst
Smithsonian Conservation Biology Institute
Goethe University Frankfurt
Conservation International Indonesia; Marine Program; Jalan Pejaten Barat 16A, Kemang Jakarta DKI Jakarta 12550 Indonesia
Authors
Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Mongolia
Description
Home-range estimation is an important application of animal tracking data that is frequently complicated by autocorrelation, sampling irregularity, and small effective sample sizes. We introduce a novel, optimal weighting method that accounts for temporal sampling bias in autocorrelated tracking data. This method corrects for irregular and missing data, such that oversampled times are downweighted and undersampled times are upweighted to minimize error in the home-range estimate. We also introduce computationally efficient algorithms that make this method feasible with large datasets. Generally speaking, there are three situations where weight optimization improves the accuracy of home-range estimates: with marine data, where the sampling schedule is highly irregular, with duty cycled data, where the sampling schedule changes during the observation period, and when a small number of home-range crossings are observed, making the beginning and end times more independent and informative than the intermediate times. Using both simulated data and empirical examples including reef manta ray, Mongolian gazelle, and African buffalo, optimal weighting is shown to reduce the error and increase the spatial resolution of home-range estimates. With a conveniently packaged and computationally efficient software implementation, this method broadens the array of datasets with which accurate space-use assessments can be made.
h
Advanced-Math
huggingface.co
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
haijian (2025). Advanced-Math [Dataset]. https://huggingface.co/datasets/haijian06/Advanced-Math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2025
Authors
haijian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Here's a concise README for your Advanced-Math dataset:

Advanced-Math Dataset

This Advanced-Math dataset is designed to support advanced studies and research in various mathematical fields. It encompasses a wide range of topics, including:

Calculus Linear Algebra Probability Machine Learning Deep Learning

The dataset primarily focuses on computational problems, which constitute over 80% of the content. Additionally, it includes related logical concept questions to provide a… See the full description on the dataset page: https://huggingface.co/datasets/haijian06/Advanced-Math.
Prime Number Source Code with Dataset
figshare.com
zip
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27215508.v1
Dataset updated
Oct 12, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Ayman Mostafa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.
Prime gap frequency distribution (powers of 2)
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erick Magyar (2025). Prime gap frequency distribution (powers of 2) [Dataset]. https://www.kaggle.com/datasets/erickmagyar/prime-gap-frequency-distribution-powers-of-2
Explore at:
zip(5860739 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Erick Magyar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description: A Deep Dive into Prime Gap Distribution and Primorial Harmonics Overview: This dataset offers a comprehensive exploration of prime gap distribution, focusing on the intriguing patterns associated with primorials and their harmonics. Primorials, the product of the first n prime numbers, play a significant role in shaping the landscape of prime gaps. By analyzing the distribution of prime gaps and their relation to primorials, we can gain deeper insights into the fundamental structure of prime numbers. Data Structure: * Power of 2: The base-2 exponent. * Gap Size N: The size of the Nth prime gap following the given power of 2. Key Features: * Primorial Harmonics: The dataset highlights the appearance of prime gaps that are multiples of primorials, suggesting a deeper connection between these numbers and the distribution of primes. * Large Prime Gaps: The dataset includes information on exceptionally large prime gaps, which can provide valuable clues about the underlying structure of the number line. * Prime Number Distribution: The distribution of prime numbers within the specified range is analyzed, revealing patterns and anomalies. Potential Applications: * Number Theory Research: * Investigating the role of primorials in shaping prime gap distribution. * Testing conjectures related to the Riemann Hypothesis and the Twin Prime Conjecture. * Exploring the connection between prime gaps and other mathematical concepts, such as modular arithmetic and number theory functions. * Machine Learning and Data Science: * Training machine learning models to predict prime gap sizes, incorporating primorials as features. * Developing algorithms to identify and analyze primorial-related patterns. * Computational Mathematics: * Benchmarking computational resources and algorithms for prime number generation and factorization. * Developing new algorithms for efficient computation of primorials and their harmonics. How to Use This Dataset: * Data Exploration: * Visualize the distribution of prime gaps, highlighting the occurrence of primorial harmonics. * Analyze the frequency of different gap sizes, focusing on multiples of primorials. * Study the relationship between prime gap size and the corresponding power of 2, considering the influence of primorials. * Machine Learning: * Incorporate features related to primorials and their harmonics into machine learning models. * Experiment with different feature engineering techniques and hyperparameter tuning to improve model performance. * Use the dataset to train models that can predict the occurrence of large prime gaps and other significant patterns. * Number Theory Research: * Use the dataset to formulate and test new conjectures about the distribution of prime gaps and the role of primorials. * Explore the connection between prime gap distribution and other mathematical fields, such as cryptography and coding theory. By leveraging this dataset, researchers can gain a deeper understanding of the intricate patterns and underlying structures that govern the distribution of prime numbers.

Supplement to the Prime Gap Dataset Description Unveiling the Mysteries of Prime Gaps The Prime Gap Dataset offers a unique opportunity to delve into the fascinating world of prime numbers. By analyzing the distribution of gaps between consecutive primes, we can uncover hidden patterns and structures that might hold the key to unlocking the secrets of the universe. Key Features and Potential Insights: * Visual Exploration: Immerse yourself in stunning visualizations of prime gap distributions, revealing hidden patterns and anomalies. * Statistical Analysis: Conduct in-depth statistical analysis to identify trends, correlations, and outliers. * Machine Learning Applications: Employ machine learning techniques to predict prime gap distributions and discover novel insights. * Fractal Analysis: Investigate the potential fractal nature of prime number distributions, revealing self-similarity at different scales. Potential Research Directions: * Uncovering Hidden Patterns: Explore the distribution of prime gaps at various scales to identify emerging patterns and structures. * Predicting Prime Gap Behavior: Develop machine learning models to predict the size and distribution of future prime gaps. * Testing Mathematical Conjectures: Use the dataset to test conjectures related to prime number distribution, such as the Riemann Hypothesis. * Exploring Connections to Other Fields: Investigate the relationship between prime numbers and other mathematical fields, such as chaos theory and information theory. By delving into this rich dataset, you can contribute to the ongoing exploration of one of the most fundamental and enduring mysteries of mathematics.
n
Data from: Overcoming the challenge of small effective sample sizes in...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Sep 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese (2019). Overcoming the challenge of small effective sample sizes in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.16bc7f2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.16bc7f2
Dataset updated
Sep 8, 2019
Authors
Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Brazil, Pantanal
Description
Technological advances have steadily increased the detail of animal tracking datasets, yet fundamental data limitations exist for many species that cause substantial biases in home‐range estimation. Specifically, the effective sample size of a range estimate is proportional to the number of observed range crossings, not the number of sampled locations. Currently, the most accurate home‐range estimators condition on an autocorrelation model, for which the standard estimation frame‐works are based on likelihood functions, even though these methods are known to underestimate variance—and therefore ranging area—when effective sample sizes are small. Residual maximum likelihood (REML) is a widely used method for reducing bias in maximum‐likelihood (ML) variance estimation at small sample sizes. Unfortunately, we find that REML is too unstable for practical application to continuous‐time movement models. When the effective sample size N is decreased to N ≤ urn:x-wiley:2041210X:media:mee313270:mee313270-math-0001(10), which is common in tracking applications, REML undergoes a sudden divergence in variance estimation. To avoid this issue, while retaining REML’s first‐order bias correction, we derive a family of estimators that leverage REML to make a perturbative correction to ML. We also derive AIC values for REML and our estimators, including cases where model structures differ, which is not generally understood to be possible. Using both simulated data and GPS data from lowland tapir (Tapirus terrestris), we show how our perturbative estimators are more accurate than traditional ML and REML methods. Specifically, when urn:x-wiley:2041210X:media:mee313270:mee313270-math-0002(5) home‐range crossings are observed, REML is unreliable by orders of magnitude, ML home ranges are ~30% underestimated, and our perturbative estimators yield home ranges that are only ~10% underestimated. A parametric bootstrap can then reduce the ML and perturbative home‐range underestimation to ~10% and ~3%, respectively. Home‐range estimation is one of the primary reasons for collecting animal tracking data, and small effective sample sizes are a more common problem than is currently realized. The methods introduced here allow for more accurate movement‐model and home‐range estimation at small effective sample sizes, and thus fill an important role for animal movement analysis. Given REML’s widespread use, our methods may also be useful in other contexts where effective sample sizes are small.
Research Papers Dataset
kaggle.com
zip
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NECHBA MOHAMMED (2023). Research Papers Dataset [Dataset]. https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset
Explore at:
zip(619131172 bytes)Available download formats
Dataset updated
May 8, 2023
Authors
NECHBA MOHAMMED
Description
Description: This dataset (Version 10) contains a collection of research papers along with various attributes and metadata. It is a comprehensive and diverse dataset that can be used for a wide range of research and analysis tasks. The dataset encompasses papers from different fields of study, including computer science, mathematics, physics, and more.

Fields in the Dataset: - id: A unique identifier for each paper. - title: The title of the research paper. - authors: The list of authors involved in the paper. - venue: The journal or venue where the paper was published. - year: The year when the paper was published. - n_citation: The number of citations received by the paper. - references: A list of paper IDs that are cited by the current paper. - abstract: The abstract of the paper.

Example: - "id": "013ea675-bb58-42f8-a423-f5534546b2b1", - "title": "Prediction of consensus binding mode geometries for related chemical series of positive allosteric modulators of adenosine and muscarinic acetylcholine receptors", - "authors": ["Leon A. Sakkal", "Kyle Z. Rajkowski", "Roger S. Armen"], - "venue": "Journal of Computational Chemistry", - "year": 2017, - "n_citation": 0, - "references": ["4f4f200c-0764-4fef-9718-b8bccf303dba", "aa699fbf-fabe-40e4-bd68-46eaf333f7b1"], - "abstract": "This paper studies ..."

Cite: https://www.aminer.cn/citation
p
Range View Elementary School
publicschoolreview.com
json, xml
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review, Range View Elementary School [Dataset]. https://www.publicschoolreview.com/range-view-elementary-school-profile
Explore at:
xml, jsonAvailable download formats
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2011 - Dec 31, 2025
Description
Historical Dataset of Range View Elementary School is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2013-2023),Total Classroom Teachers Trends Over Years (2013-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2013-2023),American Indian Student Percentage Comparison Over Years (2011-2023),Asian Student Percentage Comparison Over Years (2021-2022),Hispanic Student Percentage Comparison Over Years (2013-2023),Black Student Percentage Comparison Over Years (2019-2022),White Student Percentage Comparison Over Years (2013-2023),Two or More Races Student Percentage Comparison Over Years (2013-2023),Diversity Score Comparison Over Years (2013-2023),Free Lunch Eligibility Comparison Over Years (2013-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2013-2023),Reading and Language Arts Proficiency Comparison Over Years (2011-2022),Math Proficiency Comparison Over Years (2012-2023),Overall School Rank Trends Over Years (2012-2023)
p
Data from: MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded...
physionet.org
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun (2025). MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples [Dataset]. http://doi.org/10.13026/e5bq-pr14
Explore at:
Unique identifier
https://doi.org/10.13026/e5bq-pr14
Dataset updated
Sep 9, 2025
Authors
Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to the lack of large-scale instruction-following datasets. To address this, we present MIMIC-IV-Ext-Instr, a dataset containing over 450K open-ended, instruction-following examples generated using GPT-3.5 on a HIPAA-compliant platform. Derived from the MIMIC-IV EHR database, MIMIC-IV-Ext-Instr spans a wide range of topics and is specifically designed to support instruction-tuning of general-purpose LLMs for diverse clinical applications.
f
Data from: SyMANTIC: An Efficient Symbolic Regression Method for...
acs.figshare.com
zip
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madhav R. Muthyala; Farshud Sorourifar; You Peng; Joel A. Paulson (2025). SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond [Dataset]. http://doi.org/10.1021/acs.iecr.4c03503.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.iecr.4c03503.s001
Dataset updated
Feb 4, 2025
Dataset provided by
ACS Publications
Authors
Madhav R. Muthyala; Farshud Sorourifar; You Peng; Joel A. Paulson
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Symbolic regression (SR) is an emerging branch of machine learning focused on discovering simple and interpretable mathematical expressions from data. Although a wide-variety of SR methods have been developed, they often face challenges such as high computational cost, poor scalability with respect to the number of input dimensions, fragility to noise, and an inability to balance accuracy and complexity. This work introduces SyMANTIC, a novel SR algorithm that addresses these challenges. SyMANTIC efficiently identifies (potentially several) low-dimensional descriptors from a large set of candidates (from ∼105 to ∼1010 or more) through a unique combination of mutual information-based feature selection, adaptive feature expansion, and recursively applied l0-based sparse regression. In addition, it employs an information-theoretic measure to produce an approximate set of Pareto-optimal equations, each offering the best-found accuracy for a given complexity. Furthermore, our open-source implementation of SyMANTIC, built on the PyTorch ecosystem, facilitates easy installation and GPU acceleration. We demonstrate the effectiveness of SyMANTIC across a range of problems, including synthetic examples, scientific benchmarks, real-world material property predictions, and chaotic dynamical system identification from small datasets. Extensive comparisons show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.
d
Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thakur, Nirmalya (2023). Twitter Big Data as A Resource For Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.7910/DVN/VPPTRF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VPPTRF
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Thakur, Nirmalya
Description
Please cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.
h
AlgebraicEquationsGenerator
huggingface.co
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Shortland (2025). AlgebraicEquationsGenerator [Dataset]. http://doi.org/10.57967/hf/5121
Explore at:
Unique identifier
https://doi.org/10.57967/hf/5121
Dataset updated
Apr 12, 2025
Authors
Michael Shortland
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
/**

Algebraic Equation Dataset Generator for Hugging Face

This script generates diverse datasets of algebraic equations with their solutions, producing different valid equations each time it's run, properly formatted for Hugging Face. */

// Utility function to generate a random integer within a range function getRandomInt(min, max) { return Math.floor(Math.random() * (max - min + 1)) + min; } // Utility function to get a random non-zero integer within a range function… See the full description on the dataset page: https://huggingface.co/datasets/BarefootMikeOfHorme/AlgebraicEquationsGenerator.
c
Suspension Rate by Grade Range - Datasets - CTData.org
data.ctdata.org
Updated Mar 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Suspension Rate by Grade Range - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/suspension-rate-by-grade-range
Explore at:
Dataset updated
Mar 16, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full Description This dataset reports the total number of unique, unduplicated students in a given grade range that have received at least one In-school Suspension (ISS), Out-of-school Suspension (OSS), or Expulsion (EXP) out of the total number of students enrolled in the Public School Information System (PSIS) as of October of the given year. This dataset is based on School Years. Elementary includes Pre-Kindergarten through grade 5. Middle School includes grade 6 through grade 8. High School includes grade 9 through grade 12.
m
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven
app.mobito.io
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
Explore at:
Area covered
United States
Description
Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
d
Major Basin Lines
catalog.data.gov
data.ct.gov
+3more
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Energy & Environmental Protection (2025). Major Basin Lines [Dataset]. https://catalog.data.gov/dataset/major-basin-lines-f1f41
Explore at:
Dataset updated
Feb 12, 2025
Dataset provided by
Department of Energy & Environmental Protection
Description
See full Data Guide here.Major Drainage Basin Set: Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data. Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data.
S
LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target...
scidb.cn
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
陈小龙; 刘佳; 汪兴海; 关键 (2025). LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target Detection Dataset [Dataset]. http://doi.org/10.57760/sciencedb.radars.00076
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.radars.00076
Dataset updated
Nov 5, 2025
Dataset provided by
Science Data Bank
Authors
陈小龙; 刘佳; 汪兴海; 关键
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The Low, Slow, and Small Target Detection Dataset for Digital Array Surveillance Radar (LSS-DAUR-1.0) includes a total of 154 items of Range-Doppler (RD) complex data and Track (TR) point data collected from 6 types of targets (passenger ships, speedboats, helicopters, rotary-wing UAVs, birds, fixed-wing UAVs). It can support research on detection, classification and recognition of typical maritime targets by digital array radar. 1. Data Collection Process The data collection process mainly includes: Set radar parameters → Detect targets → Collect echo signal data → Record target information → Determine the range bin where the target is located → Extract target Doppler data → Extract target track data. 2. Target Situation The collected typical sea-air targets include 6 categories: passenger ships, speedboats, helicopters, rotary-wing UAVs, birds and fixed-wing UAVs. 3. Range-Doppler (RD) Complex Data By calculating the target range, the echo data of the range bin where the target is located is intercepted. Based on the collected measured data, the Low, Slow, and Small Target RD Dataset for Digital Array Surveillance Radar is constructed, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes the target's Doppler, GPS time, frame count, etc. The naming method of target RD data is: Start Collection Time_DAUR_RD_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_RD_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "RD" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number. 4. Track (TR) Data Extract the track data within the time period of the echo data, and construct the Low, Slow, and Small Target TR Dataset for Digital Array Surveillance Radar, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes target range, target azimuth, elevation angle, target speed, GPS time, signal-to-noise ratio (SNR), etc. The TR data and RD data have the same time and batch number, and they are data of different dimensions for the same target in the same time period. The naming method of target TR data is: Start Collection Time_DAUR_TR_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_TR_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "TR" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number.
N
South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/52700142-f122-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

Key observations

Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 21.5.

Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 28.6.

Total dependency ratio for South Range, MI is 50.1.

Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 3.5.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the South Range for the selected age group is shown in the following column.

Population (Female): The female population in the South Range for the selected age group is shown in the following column.

Total Population: The total population of the South Range for the selected age group is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
D
Data from: How climate extremes—not means—define a species' geographic range...
datasetcatalog.nlm.nih.gov
datadryad.org
+1more
Updated Oct 22, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lynch, Heather J.; Fagan, William F.; Cantrell, Stephen; Calabrese, Justin M.; Rhainds, Marc; Cosner, Chris (2014). How climate extremes—not means—define a species' geographic range boundary via a demographic tipping point [Dataset]. http://doi.org/10.5061/dryad.1v02q
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.1v02q
Dataset updated
Oct 22, 2014
Authors
Lynch, Heather J.; Fagan, William F.; Cantrell, Stephen; Calabrese, Justin M.; Rhainds, Marc; Cosner, Chris
Description
Species’ geographic range limits interest biologists and resource managers alike; however, scientists lack strong mechanistic understanding of the factors that set geographic range limits in the field, especially for animals. There exists a clear need for detailed case studies that link mechanisms to spatial dynamics and boundaries because such mechanisms allow us to predict whether climate change is likely to change a species’ geographic range and, if so, how abundance in marginal populations compares to the core. The bagworm Thyridopteryx ephemeraeformis (Lepidoptera: Psychidae) is a major native pest of cedars, arborvitae, junipers, and other landscape trees throughout much of North America. Across dozens of bagworm populations spread over six degrees of latitude in the American Midwest, we find latitudinal declines in fecundity and egg and pupal survivorship as you proceed towards the northern range boundary. A spatial gradient of bagworm reproductive success emerges, which is associated with a progressive decline in local abundance and an increase in the risk of local population extinction near the species’ geographic range boundary. We develop a mathematical model, completely constrained by empirically estimated parameters, to explore the relative roles of reproductive asynchrony and stage-specific survivorship in generating the range limit for this species. We find that overwinter egg mortality is the biggest constraint on bagworm persistence beyond their northern range limit. Overwinter egg mortality is directly related to winter temperatures that fall below the bagworm eggs’ physiological limit. This threshold, in conjunction with latitudinal declines in fecundity and pupal survivorship, creates a non-linear response to climate extremes that sets the geographic boundary and provides a path for predicting northward range expansion under altered climate conditions. Our mechanistic modeling approach demonstrates how species’ sensitivity to climate extremes can create population tipping points not reflected in demographic responses to climate means, a distinction that is critical to successful ecological forecasting.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). math_dataset [Dataset]. https://www.tensorflow.org/datasets/catalog/math_dataset

math_dataset

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 4, 2023

Description

Mathematics database.

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Original paper: Analysing Mathematical Reasoning Abilities of Neural Models (Saxton, Grefenstette, Hill, Kohli).

Example usage:

train_examples, val_examples = tfds.load(
  'math_dataset/arithmetic_mul',
  split=['train', 'test'],
  as_supervised=True)

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('math_dataset', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

math_dataset

MathInstruct Dataset: Hybrid Math Instruction

MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

About this dataset

How to use the dataset

Research Ideas

Trends in Math Proficiency (2012-2023): Range View Elementary School vs....

Data from: Correcting for missing and irregular data in home-range...

Advanced-Math

Prime Number Source Code with Dataset

Prime gap frequency distribution (powers of 2)

Data from: Overcoming the challenge of small effective sample sizes in...

Research Papers Dataset

Range View Elementary School

Data from: MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded...

Data from: SyMANTIC: An Efficient Symbolic Regression Method for...

Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...

AlgebraicEquationsGenerator

Suspension Rate by Grade Range - Datasets - CTData.org

USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

Major Basin Lines

LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target...

South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Data from: How climate extremes—not means—define a species' geographic range...

math_datasetSee More Versions

math_dataset