65 datasets found
  1. Mathematics Dataset

    • github.com
    • opendatalab.com
    • +1more
    Updated Apr 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
    Explore at:
    Dataset updated
    Apr 3, 2019
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Description

    This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

    ## Example questions

     Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
     Answer: 4
     
     Question: Calculate -841880142.544 + 411127.
     Answer: -841469015.544
     
     Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
     Answer: 54*a - 30
    

    It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

    • algebra (linear equations, polynomial roots, sequences)
    • arithmetic (pairwise operations and mixed expressions, surds)
    • calculus (differentiation)
    • comparison (closest numbers, pairwise comparisons, sorting)
    • measurement (conversion, working with time)
    • numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
    • polynomials (addition, simplification, composition, evaluating, expansion)
    • probability (sampling without replacement)
  2. MathInstruct Dataset: Hybrid Math Instruction

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MathInstruct Dataset: Hybrid Math Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathinstruct-dataset-hybrid-math-instruction-tun
    Explore at:
    zip(60239940 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MathInstruct Dataset: Hybrid Math Instruction Tuning

    A curated dataset for math instruction tuning models

    By TIGER-Lab (From Huggingface) [source]

    About this dataset

    MathInstruct is a comprehensive and meticulously curated dataset specifically designed to facilitate the development and evaluation of models for math instruction tuning. This dataset consists of a total of 13 different math rationale datasets, out of which six have been exclusively curated for this project, ensuring a diverse range of instructional materials. The main objective behind creating this dataset is to provide researchers with an easily accessible and manageable resource that aids in enhancing the effectiveness and precision of math instruction.

    One noteworthy feature of MathInstruct is its lightweight nature, making it highly convenient for researchers to utilize without any hassle. With carefully selected columns such as source, source, output, output, users can readily identify the origin or reference material from where the math instruction was obtained. Additionally, they can also refer to the expected output or solution corresponding to each specific math problem or exercise.

    Overall, MathInstruct offers immense potential in refining hybrid math instruction by facilitating meticulous model development and rigorous evaluation processes. Researchers can leverage this diverse dataset to gain deeper insights into effective teaching methodologies while exploring innovative approaches towards enhancing mathematical learning experiences

    How to use the dataset

    Title: How to Use the MathInstruct Dataset for Hybrid Math Instruction Tuning

    Introduction: The MathInstruct dataset is a comprehensive collection of math instruction examples, designed to assist in developing and evaluating models for math instruction tuning. This guide will provide an overview of the dataset and explain how to make effective use of it.

    • Understanding the Dataset Structure: The dataset consists of a file named train.csv. This CSV file contains the training data, which includes various columns such as source and output. The source column represents the source of math instruction (textbook, online resource, or teacher), while the output column represents expected output or solution to a particular math problem or exercise.

    • Accessing the Dataset: To access the MathInstruct dataset, you can download it from Kaggle's website. Once downloaded, you can read and manipulate the data using programming languages like Python with libraries such as pandas.

    • Exploring the Columns: a) Source Column: The source column provides information about where each math instruction comes from. It may include references to specific textbooks, online resources, or even teachers who provided instructional material. b) Output Column: The output column specifies what students are expected to achieve as a result of each math instruction. It contains solutions or expected outputs for different math problems or exercises.

    • Utilizing Source Information: By analyzing the different sources mentioned in this dataset, researchers can understand which instructional materials are more effective in teaching specific topics within mathematics. They can also identify common strategies used by teachers across multiple sources.

    • Analyzing Expected Outputs: Researchers can study variations in expected outputs for similar types of problems across different sources. This analysis may help identify differences in approaches across textbooks/resources and enrich our understanding of various teaching methods.

    • Model Development and Evaluation: Researchers can utilize this dataset to develop machine learning models that automatically assess whether a given math instruction leads to the expected output. By training models on this data, one can create automated systems that provide feedback on math problems or suggest alternative instruction sources.

    • Scaling the Dataset: Due to its lightweight nature, the MathInstruct dataset is easily accessible and manageable. Researchers can scale up their training data by combining it with other instructional datasets or expand it further by labeling more examples based on similar guidelines.

    Conclusion: The MathInstruct dataset serves as a valuable resource for developing and evaluating models related to math instruction tuning. By analyzing the source information and expected outputs, researchers can gain insights into effective teaching methods and build automated assessment

    Research Ideas

    • Model development: This dataset can be used for developing and training models for math instruction...
  3. p

    Trends in Math Proficiency (2012-2023): Range View Elementary School vs....

    • publicschoolreview.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Trends in Math Proficiency (2012-2023): Range View Elementary School vs. Colorado vs. Weld County Reorganized School District No. Re-4 [Dataset]. https://www.publicschoolreview.com/range-view-elementary-school-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual math proficiency from 2012 to 2023 for Range View Elementary School vs. Colorado and Weld County Reorganized School District No. Re-4

  4. Generation of prime numbers based on a range.

    • plos.figshare.com
    xls
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amal Ezz-Eldien; Mohamed Ezz; Amjad Alsirhani; Ayman Mohamed Mostafa; Abdullah Alomari; Faeiz Alserhani; Mohammed Mujib Alshahrani (2024). Generation of prime numbers based on a range. [Dataset]. http://doi.org/10.1371/journal.pone.0311782.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Amal Ezz-Eldien; Mohamed Ezz; Amjad Alsirhani; Ayman Mohamed Mostafa; Abdullah Alomari; Faeiz Alserhani; Mohammed Mujib Alshahrani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.

  5. n

    Data from: Correcting for missing and irregular data in home-range...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jan 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese (2018). Correcting for missing and irregular data in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.n42h0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2018
    Dataset provided by
    Goethe University Frankfurt
    University of Tasmania
    Smithsonian Conservation Biology Institute
    University of Maryland, College Park
    University of Massachusetts Amherst
    Conservation International Indonesia; Marine Program; Jalan Pejaten Barat 16A, Kemang Jakarta DKI Jakarta 12550 Indonesia
    Authors
    Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Mongolia
    Description

    Home-range estimation is an important application of animal tracking data that is frequently complicated by autocorrelation, sampling irregularity, and small effective sample sizes. We introduce a novel, optimal weighting method that accounts for temporal sampling bias in autocorrelated tracking data. This method corrects for irregular and missing data, such that oversampled times are downweighted and undersampled times are upweighted to minimize error in the home-range estimate. We also introduce computationally efficient algorithms that make this method feasible with large datasets. Generally speaking, there are three situations where weight optimization improves the accuracy of home-range estimates: with marine data, where the sampling schedule is highly irregular, with duty cycled data, where the sampling schedule changes during the observation period, and when a small number of home-range crossings are observed, making the beginning and end times more independent and informative than the intermediate times. Using both simulated data and empirical examples including reef manta ray, Mongolian gazelle, and African buffalo, optimal weighting is shown to reduce the error and increase the spatial resolution of home-range estimates. With a conveniently packaged and computationally efficient software implementation, this method broadens the array of datasets with which accurate space-use assessments can be made.

  6. h

    AlgebraicEquationsGenerator

    • huggingface.co
    Updated Apr 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Shortland (2025). AlgebraicEquationsGenerator [Dataset]. http://doi.org/10.57967/hf/5121
    Explore at:
    Dataset updated
    Apr 12, 2025
    Authors
    Michael Shortland
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    /**

    Algebraic Equation Dataset Generator for Hugging Face

    This script generates diverse datasets of algebraic equations with their solutions, producing different valid equations each time it's run, properly formatted for Hugging Face. */

    // Utility function to generate a random integer within a range function getRandomInt(min, max) { return Math.floor(Math.random() * (max - min + 1)) + min; } // Utility function to get a random non-zero integer within a range function… See the full description on the dataset page: https://huggingface.co/datasets/BarefootMikeOfHorme/AlgebraicEquationsGenerator.

  7. p

    Range View Elementary School

    • publicschoolreview.com
    json, xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Range View Elementary School [Dataset]. https://www.publicschoolreview.com/range-view-elementary-school-profile
    Explore at:
    xml, jsonAvailable download formats
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2011 - Dec 31, 2025
    Description

    Historical Dataset of Range View Elementary School is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2013-2023),Total Classroom Teachers Trends Over Years (2013-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2013-2023),American Indian Student Percentage Comparison Over Years (2011-2023),Asian Student Percentage Comparison Over Years (2021-2022),Hispanic Student Percentage Comparison Over Years (2013-2023),Black Student Percentage Comparison Over Years (2019-2022),White Student Percentage Comparison Over Years (2013-2023),Two or More Races Student Percentage Comparison Over Years (2013-2023),Diversity Score Comparison Over Years (2013-2023),Free Lunch Eligibility Comparison Over Years (2013-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2013-2023),Reading and Language Arts Proficiency Comparison Over Years (2011-2022),Math Proficiency Comparison Over Years (2012-2023),Overall School Rank Trends Over Years (2012-2023)

  8. Data from: Overcoming the challenge of small effective sample sizes in...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese; Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese (2022). Data from: Overcoming the challenge of small effective sample sizes in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.16bc7f2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese; Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Technological advances have steadily increased the detail of animal tracking datasets, yet fundamental data limitations exist for many species that cause substantial biases in home‐range estimation. Specifically, the effective sample size of a range estimate is proportional to the number of observed range crossings, not the number of sampled locations. Currently, the most accurate home‐range estimators condition on an autocorrelation model, for which the standard estimation frame‐works are based on likelihood functions, even though these methods are known to underestimate variance—and therefore ranging area—when effective sample sizes are small. Residual maximum likelihood (REML) is a widely used method for reducing bias in maximum‐likelihood (ML) variance estimation at small sample sizes. Unfortunately, we find that REML is too unstable for practical application to continuous‐time movement models. When the effective sample size N is decreased to N ≤ urn:x-wiley:2041210X:media:mee313270:mee313270-math-0001(10), which is common in tracking applications, REML undergoes a sudden divergence in variance estimation. To avoid this issue, while retaining REML's first‐order bias correction, we derive a family of estimators that leverage REML to make a perturbative correction to ML. We also derive AIC values for REML and our estimators, including cases where model structures differ, which is not generally understood to be possible. Using both simulated data and GPS data from lowland tapir (Tapirus terrestris), we show how our perturbative estimators are more accurate than traditional ML and REML methods. Specifically, when urn:x-wiley:2041210X:media:mee313270:mee313270-math-0002(5) home‐range crossings are observed, REML is unreliable by orders of magnitude, ML home ranges are ~30% underestimated, and our perturbative estimators yield home ranges that are only ~10% underestimated. A parametric bootstrap can then reduce the ML and perturbative home‐range underestimation to ~10% and ~3%, respectively. Home‐range estimation is one of the primary reasons for collecting animal tracking data, and small effective sample sizes are a more common problem than is currently realized. The methods introduced here allow for more accurate movement‐model and home‐range estimation at small effective sample sizes, and thus fill an important role for animal movement analysis. Given REML's widespread use, our methods may also be useful in other contexts where effective sample sizes are small.

  9. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  10. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  11. c

    Justifying and Proving in School Mathematics: Student Conceptions and School...

    • datacatalogue.cessda.eu
    • datacatalogue.ukdataservice.ac.uk
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Healy, L., University of London, Institute of Education; Hoyles, C., University of London, Institute of Education (2024). Justifying and Proving in School Mathematics: Student Conceptions and School Data, 1996 [Dataset]. http://doi.org/10.5255/UKDA-SN-4004-1
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Mathematical Sciences
    Authors
    Healy, L., University of London, Institute of Education; Hoyles, C., University of London, Institute of Education
    Time period covered
    May 1, 1996 - Jul 1, 1996
    Area covered
    England and Wales
    Variables measured
    Individuals, Institutions/organisations, National, Pupils, Teachers
    Measurement technique
    Educational measurements, questionnaire administered by fieldworker
    Description

    Abstract copyright UK Data Service and data collection copyright owner.


    In recent years there has been considerable interest in reassessing the role of mathematical proof, influenced by developments in computer technology and an increasing awareness of the role of proof in conveying and illuminating as well as verifying mathematical ideas. Research in mathematics education has shown proof to be an elusive concept for many students. This has been one influence underlying the shift away from formal methods in schools to the more process-orientated approaches now enshrined in the UK National Curriculum Using and Applying Mathematics.
    In this project a nationwide survey was conducted to ascertain the current profile of conceptions amongst 15-year-old high-attaining students of the validity of a range of modes of justification in geometry and algebra. Analysis of the survey data informed the design of two teaching experiments in these mathematical domains incorporating computer use and aiming specifically to encourage links between empirical and deductive reasoning. Case studies were constructed to evaluate the influence of these innovations on students' understanding of proving the role of formal mathematical proof.
    Both strands of the research contributed to the formulation of recommendations concerning the emphasis on and positioning of mathematical proof in the school curriculum.
    Main Topics:

    The study consists of two datasets:
    the larger dataset contains the responses of the sample of Year 10 students (14 or 15 years old) to a proof questionnaire, which comprised a question to ascertain students' views on the role of proof, followed by items in two domains of mathematics - arithmetic/algebra and geometry - presented in open and multiple-choice formats. In the open format, students were asked to construct one familiar and one unfamiliar proof in each domain. In the multiple-choice format, students were required to choose from a range of arguments in support of or refuting a conjecture in accordance with two criteria: which argument would be nearest to their own approach if asked to prove the given statement, and which they believed would receive the best mark.
    The smaller dataset contains responses of the respondent students' teachers to the school questionnaire. This questionnaire was designed to obtain data about the school and the mathematics teacher of the class selected to complete the proof questionnaire. These teachers also completed all the multiple-choice questions in the proof questionnaire, in order to obtain their choices of argument and to identify the proof they thought their students believed would receive the best mark. The responses are included in this dataset.

  12. d

    Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Nirmalya (2023). Twitter Big Data as A Resource For Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.7910/DVN/VPPTRF
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Thakur, Nirmalya
    Description

    Please cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.

  13. c

    Suspension Rate by Grade Range - Datasets - CTData.org

    • data.ctdata.org
    Updated Mar 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Suspension Rate by Grade Range - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/suspension-rate-by-grade-range
    Explore at:
    Dataset updated
    Mar 16, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Full Description This dataset reports the total number of unique, unduplicated students in a given grade range that have received at least one In-school Suspension (ISS), Out-of-school Suspension (OSS), or Expulsion (EXP) out of the total number of students enrolled in the Public School Information System (PSIS) as of October of the given year. This dataset is based on School Years. Elementary includes Pre-Kindergarten through grade 5. Middle School includes grade 6 through grade 8. High School includes grade 9 through grade 12.

  14. m

    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

    • app.mobito.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
    Explore at:
    Area covered
    United States
    Description

    Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).

  15. u

    Probabilities of Adjusted Elevation for 2080s

    • marine.usgs.gov
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Probabilities of Adjusted Elevation for 2080s [Dataset]. https://marine.usgs.gov/coastalchangehazardsportal/ui/info/item/EXf3LkWP
    Explore at:
    Dataset updated
    Jul 30, 2025
    Area covered
    Description

    The U.S. Geological Survey has been forecasting sea-level rise impacts on the landscape to evaluate where coastal land will be available for future use. The purpose of this project is to develop a spatially explicit, probabilistic model of coastal response for the Northeastern U.S. to a variety of sea-level scenarios that take into account the variable nature of the coast and provides outputs at spatial and temporal scales suitable for decision support. Model results provide predictions of adjusted land elevation ranges (AE) with respect to forecast sea-levels, a likelihood estimate of this outcome (PAE), and a probability of coastal response (CR) characterized as either static or dynamic. The predictions span the coastal zone vertically from -12 meters (m) to 10 m above mean high water (MHW). Results are produced at a horizontal resolution of 30 meters for four decades (the 2020s, 2030s, 2050s and 2080s). Adjusted elevations and their respective probabilities are generated using regional geospatial datasets of current sea-level forecasts, vertical land movement rates, and current elevation data. Coastal response type predictions incorporate adjusted elevation predictions with land cover data and expert knowledge to determine the likelihood that an area will be able to accommodate or adapt to water level increases and maintain its initial land class state or transition to a new non-submerged state (dynamic) or become submerged (static). Intended users of these data include scientific researchers, coastal planners, and natural resource management communities.

    These GIS layers provide the probability of observing the forecast of adjusted land elevation (PAE) with respect to predicted sea-level rise or the Northeastern U.S. for the 2020s, 2030s, 2050s and 2080s. These data are based on the following inputs: sea-level rise, vertical land movement rates due to glacial isostatic adjustment and elevation data. The output displays the highest probability among the five adjusted elevation ranges (-12 to -1, -1 to 0, 0 to 1, 1 to 5, and 5 to 10 m) to be observed for the forecast year as defined by a probabilistic framework (a Bayesian network), and should be used concurrently with the adjusted land elevation layer (AE), also available from http://woodshole.er.usgs.gov/project-pages/coastal_response/, which provides users with the forecast elevation range occurring when compared with the four other elevation ranges. These data layers primarily show the distribution of adjusted elevation range probabilities over a large spatial scale and should therefore be used qualitatively.

  16. Prime gap frequency distribution (powers of 2)

    • kaggle.com
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erick Magyar (2025). Prime gap frequency distribution (powers of 2) [Dataset]. https://www.kaggle.com/datasets/erickmagyar/prime-gap-frequency-distribution-powers-of-2
    Explore at:
    zip(5860739 bytes)Available download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Erick Magyar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description: A Deep Dive into Prime Gap Distribution and Primorial Harmonics Overview: This dataset offers a comprehensive exploration of prime gap distribution, focusing on the intriguing patterns associated with primorials and their harmonics. Primorials, the product of the first n prime numbers, play a significant role in shaping the landscape of prime gaps. By analyzing the distribution of prime gaps and their relation to primorials, we can gain deeper insights into the fundamental structure of prime numbers. Data Structure: * Power of 2: The base-2 exponent. * Gap Size N: The size of the Nth prime gap following the given power of 2. Key Features: * Primorial Harmonics: The dataset highlights the appearance of prime gaps that are multiples of primorials, suggesting a deeper connection between these numbers and the distribution of primes. * Large Prime Gaps: The dataset includes information on exceptionally large prime gaps, which can provide valuable clues about the underlying structure of the number line. * Prime Number Distribution: The distribution of prime numbers within the specified range is analyzed, revealing patterns and anomalies. Potential Applications: * Number Theory Research: * Investigating the role of primorials in shaping prime gap distribution. * Testing conjectures related to the Riemann Hypothesis and the Twin Prime Conjecture. * Exploring the connection between prime gaps and other mathematical concepts, such as modular arithmetic and number theory functions. * Machine Learning and Data Science: * Training machine learning models to predict prime gap sizes, incorporating primorials as features. * Developing algorithms to identify and analyze primorial-related patterns. * Computational Mathematics: * Benchmarking computational resources and algorithms for prime number generation and factorization. * Developing new algorithms for efficient computation of primorials and their harmonics. How to Use This Dataset: * Data Exploration: * Visualize the distribution of prime gaps, highlighting the occurrence of primorial harmonics. * Analyze the frequency of different gap sizes, focusing on multiples of primorials. * Study the relationship between prime gap size and the corresponding power of 2, considering the influence of primorials. * Machine Learning: * Incorporate features related to primorials and their harmonics into machine learning models. * Experiment with different feature engineering techniques and hyperparameter tuning to improve model performance. * Use the dataset to train models that can predict the occurrence of large prime gaps and other significant patterns. * Number Theory Research: * Use the dataset to formulate and test new conjectures about the distribution of prime gaps and the role of primorials. * Explore the connection between prime gap distribution and other mathematical fields, such as cryptography and coding theory. By leveraging this dataset, researchers can gain a deeper understanding of the intricate patterns and underlying structures that govern the distribution of prime numbers.

    Supplement to the Prime Gap Dataset Description Unveiling the Mysteries of Prime Gaps The Prime Gap Dataset offers a unique opportunity to delve into the fascinating world of prime numbers. By analyzing the distribution of gaps between consecutive primes, we can uncover hidden patterns and structures that might hold the key to unlocking the secrets of the universe. Key Features and Potential Insights: * Visual Exploration: Immerse yourself in stunning visualizations of prime gap distributions, revealing hidden patterns and anomalies. * Statistical Analysis: Conduct in-depth statistical analysis to identify trends, correlations, and outliers. * Machine Learning Applications: Employ machine learning techniques to predict prime gap distributions and discover novel insights. * Fractal Analysis: Investigate the potential fractal nature of prime number distributions, revealing self-similarity at different scales. Potential Research Directions: * Uncovering Hidden Patterns: Explore the distribution of prime gaps at various scales to identify emerging patterns and structures. * Predicting Prime Gap Behavior: Develop machine learning models to predict the size and distribution of future prime gaps. * Testing Mathematical Conjectures: Use the dataset to test conjectures related to prime number distribution, such as the Riemann Hypothesis. * Exploring Connections to Other Fields: Investigate the relationship between prime numbers and other mathematical fields, such as chaos theory and information theory. By delving into this rich dataset, you can contribute to the ongoing exploration of one of the most fundamental and enduring mysteries of mathematics.

  17. S

    LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target...

    • scidb.cn
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    陈小龙; 刘佳; 汪兴海; 关键 (2025). LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target Detection Dataset [Dataset]. http://doi.org/10.57760/sciencedb.radars.00076
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Science Data Bank
    Authors
    陈小龙; 刘佳; 汪兴海; 关键
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The Low, Slow, and Small Target Detection Dataset for Digital Array Surveillance Radar (LSS-DAUR-1.0) includes a total of 154 items of Range-Doppler (RD) complex data and Track (TR) point data collected from 6 types of targets (passenger ships, speedboats, helicopters, rotary-wing UAVs, birds, fixed-wing UAVs). It can support research on detection, classification and recognition of typical maritime targets by digital array radar. 1. Data Collection Process The data collection process mainly includes: Set radar parameters → Detect targets → Collect echo signal data → Record target information → Determine the range bin where the target is located → Extract target Doppler data → Extract target track data. 2. Target Situation The collected typical sea-air targets include 6 categories: passenger ships, speedboats, helicopters, rotary-wing UAVs, birds and fixed-wing UAVs. 3. Range-Doppler (RD) Complex Data By calculating the target range, the echo data of the range bin where the target is located is intercepted. Based on the collected measured data, the Low, Slow, and Small Target RD Dataset for Digital Array Surveillance Radar is constructed, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes the target's Doppler, GPS time, frame count, etc. The naming method of target RD data is: Start Collection Time_DAUR_RD_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_RD_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "RD" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number. 4. Track (TR) Data Extract the track data within the time period of the echo data, and construct the Low, Slow, and Small Target TR Dataset for Digital Array Surveillance Radar, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes target range, target azimuth, elevation angle, target speed, GPS time, signal-to-noise ratio (SNR), etc. The TR data and RD data have the same time and batch number, and they are data of different dimensions for the same target in the same time period. The naming method of target TR data is: Start Collection Time_DAUR_TR_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_TR_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "TR" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number.

  18. Flight Number Range Decoding - Mapping of flight number ranges to cargo...

    • datarade.ai
    .csv
    Updated Aug 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ch-aviation (2025). Flight Number Range Decoding - Mapping of flight number ranges to cargo customers, wet-lease customers, flight types [Dataset]. https://datarade.ai/data-products/flight-number-range-decoding-mapping-of-flight-number-range-ch-aviation
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Aug 19, 2025
    Dataset provided by
    ch-aviation GmbHhttp://www.ch-aviation.com/
    Authors
    ch-aviation
    Area covered
    Turks and Caicos Islands, Sweden, United States of America, Israel, Morocco, Mozambique, Pitcairn, Samoa, Gibraltar, Luxembourg
    Description

    For third-party ADS-B data, ch-aviation also offers options to integrate it with ch-aviation data (and to identify regional partnerships, cargo customers, wet-lease customers, and flight types), a good fit for Aircraft Finance, Operators, OEMs, Charter Brokers, Technology, Insurance, and Airports.

    For existing Aireon, AirNav, Aviation Week, Cirium, FlightAware, FlightRadar24, Planefinder, Spire, or Wingbits ADS-B customers, ch-aviation also offers options to integrate ADS-B with ch-aviation data (and to identify regional partnerships, cargo customers, wet-lease customers, and flight types).

    The Flight Number Range Decoding data files allow third-party ADS-B data customers to map each flight to ch-aviation’s data to identify regional partnerships, cargo customers, wet-lease customers (where flight numbers of the operator are used for the wet-lease operations, i.e. Aurora and Rossiya flying for Aeroflot), and flight types without having to build any of the logic on your side. We do all the matching on your behalf.

    The data set is updated daily.

    The sample data shows decoded flight number ranges for Swiss, Alaska Airlines, and Horizon Air.

    Contact us to get access to ch-aviation's AWS S3 sample data bucket as well allowing you to build proof of concepts with all of our sample data.

    The direct bucket URL for this data set is: https://eu-central-1.console.aws.amazon.com/s3/buckets/dataservices-standardised-samples?region=eu-central-1&bucketType=general&prefix=flight_number_decoding_ranges/&showversions=false

  19. a

    Active Streets (CNNs, from DataSF, pulled nightly)

    • hub.arcgis.com
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City and County of San Francisco (2025). Active Streets (CNNs, from DataSF, pulled nightly) [Dataset]. https://hub.arcgis.com/datasets/sfgov::active-streets-cnns-from-datasf-pulled-nightly?uiVersion=content-views
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    City and County of San Francisco
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Area covered
    Description

    A. SUMMARY A list of street centerlines, including both active and retired streets. These centerlines are identified by their Centerline Network Number ("CNN").B. HOW THE DATASET IS CREATED This data is extracted from the Department of Public Works Basemap. Supervisor District and Analysis Neighborhood are added during the loading process. These boundaries utilize the centroid (middle) of the line to determine the district or neighborhood.C. UPDATE PROCESS This dataset refreshes daily, though the data may not change every day.D. HOW TO USE THIS DATASET Note 1: The Class Code field is used for symbolization:1 = Freeway2 = Major street/Highway3 = Arterial street4 = Collector Street5 = Residential Street6 = Freeway Ramp0 = Other (private streets, paper street, etc.)E. RELATED DATASETS Understanding street-level dataData pushed to ArcGIS Online on November 10, 2025 at 3:25 AM by SFGIS.Data from: https://data.sfgov.org/d/3psu-pn9hDescription of dataset columns:

     cnn
     Centerline Network Number - unique identifier for dataset
    
    
     lf_fadd
     From address number on left side of street, the lowest number in the address range
    
    
     lf_toadd
     To address number on left side of street, the highest number in the address range
    
    
     rt_fadd
     From address number on right side of street, the lowest number in the address range
    
    
     rt_toadd
     To address number on right side of street, the highest number in the address range
    
    
     street
     Street name without street type
    
    
     st_type
     Street Type (AVE, ST, BLVD, et al.)
    
    
     f_st
     The street name of the segment intersects at its beginning.
    
    
     t_st
     The street name of the segment intersects at its end.
    
    
     f_node_cnn
     Centerline Network Number for the node/intersection that the street segment begins from.
    
    
     t_node_cnn
     Centerline Network Number for the node/intersection that the street segment ends on.
    
    
     accepted
     Accepted by City and County of San Francisco for maintenance.
    
    
     active
     Active street segment, i.e., not retired.
    
    
     classcode
     Classification code for street segment. Used for symbolization: 1 = Freeway 2 = Major street/Highway 3 = Arterial street 4 = Collector Street 5 = Residential Street 6 = Freeway Ramp 0 = Other (private streets, paper street, etc.)
    
    
     date_added
     Date added to dataset by Public Works.
    
    
     date_altered
     Date altered to dataset by Public Works.
    
    
     date_dropped
     Date dropped to dataset by Public Works.
    
    
     gds_chg_id_add
     The internal change transaction id when the segment was added.
    
    
     gds_chg_id_altered
     The internal change transaction id when the segment was altered.
    
    
     gds_chg_id_dropped
     The internal change transaction id when the segment was dropped/retired.
    
    
     jurisdiction
     Agency with jurisdiction over the segment, if any.
    
    
     layer
     Derived from the source AutoCAD drawing, this field indicates the category of segment. Definitions for each of the values: Freeways such as 80, 280 and 101. Paper, the centerline segment is present on Assessor and/or Public Works map, but is not an actual street in reality. Paper_fwys, the centerline segment is present on Assessor and/or Public Works map, but is not an actual street in reality, and is under or near a freeway. Paper_water, the centerline segment is present on Assessor and/or Public Works map, but is not an actual street in reality, and is under water in the Bay. PARKS, street segement maintained by Recreation and Park Department, e.g., in Golden Gate Park. Parks_NPS_FtMaso, street segement maintained by the National Park Service within Fort Mason. Parks_NPS_Presid, street segement maintained by the National Park Service within the Presidio. Private, street segment is not maintained by the City and is not on an Assessor or Public Works map. Private_parking, street segment is not maintained by the City and is not on an Assessor or Public Works map, and is a parking lot. PSEUDO, street segment created for use in addressing. Streets, standard street centerline segement. Streets_HuntersP, standard street centerline segement within the Hunters Point Shipyard area. Streets_Pedestri, standard street centerline segement, but pedestrian access only. Streets_TI, standard street centerline segement within Treasure Island. Streets_YBI, standard street centerline segement within Yerba Buena Island. UPROW, Unpaved Right of Way street centerline segment.
    
    
     nhood
     SFRealtor-defined neighborhood that the segment is primarily intersects
    
    
     oneway
     Indicates if street segment is a one way street: possible values are F (the segment is one way beginning at the "from" street) , T (the segment is one way beginning at the "to" street), or B (traffic is legal in "both" directions)
    
    
     street_gc
     Street name without street type, with the numbered streets with leading zeroes dropped to facilitate geocoding
    
    
     streetname
     Full street name and street type
    
    
     streetname_gc
     Full street name and street type, with the numbered streets with leading zeroes dropped to facilitate geocoding
    
    
     zip_code
     ZIP Code that street segment falls in.
    
    
     analysis_neighborhood
     current analysis neighborhood
    
    
     supervisor_district
     current supervisor district
    
    
     line
     Geometry
    
    
     data_as_of
     Timestamp the data was updated in the source system
    
    
     data_loaded_at
     Timestamp the data was loaded to the open data portal
    

    Note: If no description was provided by DataSF, the cell is left blank. See the source data for more information.

  20. o

    Public Health Portfolio (Directly Funded Research - Programmes and Training...

    • nihr.opendatasoft.com
    • nihr.aws-ec2-eu-central-1.opendatasoft.com
    csv, excel, json
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Public Health Portfolio (Directly Funded Research - Programmes and Training Awards) [Dataset]. https://nihr.opendatasoft.com/explore/dataset/phof-datase/
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    Nov 4, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Public Health Portfolio (Directly Funded Research - Programme and Training Awards) dataset contains NIHR directly funded research awards where the funding is allocated to an award holder or host organisation to carry out a specific piece of research or complete a training award. The NIHR also invests significantly in centres of excellence, collaborations, services and facilities to support research in England. Collectively these form NIHR infrastructure support. NIHR infrastructure supported projects are available in the Public Health Portfolio (Infrastructure Support) dataset which you can find here.NIHR directly funded research awards (Programmes and Training Awards) that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion CriteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research. NIHR directly funded research awards are categorised as public health if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health research across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR directly funded research awards categorised as public health awards. This dataset does not include public health awards or projects funded by any of the three NIHR Research Schools or NIHR Health Protection Research Units.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further InformationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows:The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Directly Funded Research (Programmes and Training Awards), and NIHR Infrastructure Support. NIHR School for Public Health here.NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here.NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here.NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
Organization logo

Mathematics Dataset

Related Article
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMindhttp://deepmind.com/
Description

This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

 Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
 Answer: 4
 
 Question: Calculate -841880142.544 + 411127.
 Answer: -841469015.544
 
 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
 Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

  • algebra (linear equations, polynomial roots, sequences)
  • arithmetic (pairwise operations and mixed expressions, surds)
  • calculus (differentiation)
  • comparison (closest numbers, pairwise comparisons, sorting)
  • measurement (conversion, working with time)
  • numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
  • polynomials (addition, simplification, composition, evaluating, expansion)
  • probability (sampling without replacement)
Search
Clear search
Close search
Google apps
Main menu