100+ datasets found
  1. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  2. i

    Data from: Simulated dataset

    • ieee-dataport.org
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nassim Ravanshad (2024). Simulated dataset [Dataset]. https://ieee-dataport.org/documents/simulated-dataset
    Explore at:
    Dataset updated
    Apr 8, 2024
    Authors
    Nassim Ravanshad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Normal 0

    false false false

    EN-US X-NONE AR-SA

  3. l

    Data from: Simulated dataset

    • figshare.le.ac.uk
    zip
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Quian Quiroga (2024). Simulated dataset [Dataset]. http://doi.org/10.25392/leicester.data.11897595.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    University of Leicester
    Authors
    Rodrigo Quian Quiroga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A simulated dataset that has been widely used in the evaluation of spike-sorting algorithms. Synthetic datasets are generated by adding spike waveform templates to background noise of various levels; this download contains several datasets, generated using different spike templates.Use wave_clus (see www2.le.ac.uk/centres/csn/software/wave-clus) for spike detection and sorting of this data. Wave_clus is a fast and unsupervised algorithm for spike detection and sorting compatible with Windows, Mac or Linux operating systems.

  4. n

    Data and code for: Generation and applications of simulated datasets to...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Silk; Olivier Gimenez (2023). Data and code for: Generation and applications of simulated datasets to integrate social network and demographic analyses [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp7s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    Centre d'Écologie Fonctionnelle et Évolutive
    Authors
    Matthew Silk; Olivier Gimenez
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.

  5. R Code of Simulations

    • catalog.data.gov
    • cloud.csiss.gmu.edu
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). R Code of Simulations [Dataset]. https://catalog.data.gov/dataset/r-code-of-simulations
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The sims zip file contains R code and accompanying files needed to run the R code. Overall this code demonstrates the R code used in the study is fully functional, documented, and reproducible and that this code could reproduce the simulation results from the study with sufficient computing time. The code as presented is for a single simulated dataset and will produce estimates and confidence intervals produced by all the methods used within the study when run on that one dataset. This dataset is associated with the following publication: Nethery, R., F. Mealli, J. Sacks, and F. Dominici. Evaluation of the Health Impacts of the 1990 Clean Air Act Amendments Using Causal Inference and Machine Learning. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION. Taylor & Francis Group, London, UK, 1-12, (2020).

  6. i

    Exponential Distribution Simulated Dataset

    • ieee-dataport.org
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabiriele Bulivou (2024). Exponential Distribution Simulated Dataset [Dataset]. https://ieee-dataport.org/documents/exponential-distribution-simulated-dataset
    Explore at:
    Dataset updated
    Jul 17, 2024
    Authors
    Gabiriele Bulivou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    featuring n=5000n = 5000n=5000 data points

  7. Simulated data

    • figshare.com
    txt
    Updated Feb 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Guillemot (2018). Simulated data [Dataset]. http://doi.org/10.6084/m9.figshare.5854659.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 5, 2018
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Vincent Guillemot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a simulated dataset used for test.

  8. f

    Simulated dataset for I = 2.26 % and RR = 3

    • figshare.com
    zip
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aline Guttmann (2016). Simulated dataset for I = 2.26 % and RR = 3 [Dataset]. http://doi.org/10.6084/m9.figshare.1308494.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Aline Guttmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of 221 datasets in R format (rda), each corresponding to 1000 simulations of one cluster with a relative risk of 3 for a base incidence of 2.26 % births per year. Each dataset is a table of 221 000 rows and 6 columns.The rows contain: -the coordinates (longitude and latitude) of a SU, the observed number of cases, -the size of the at-risk population (i.e., the number of live births), -the expected number of cases in the specified SU assuming an inhomogeneous Poisson process for the cases distribution and -an indicator for the simulation ranging from 1 to 1000.

  9. H

    PotSim: A Large-Scale Simulated Dataset for Benchmarking AI Techniques on...

    • dataverse.harvard.edu
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Krishna Pothapragada; Rishabh Gupta; Kumar k Kumar Goel; Alina Zare; Joel Harley; Lincoln Zotarelli (2025). PotSim: A Large-Scale Simulated Dataset for Benchmarking AI Techniques on Potato Crop [Dataset]. http://doi.org/10.7910/DVN/GQMDOV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Satya Krishna Pothapragada; Rishabh Gupta; Kumar k Kumar Goel; Alina Zare; Joel Harley; Lincoln Zotarelli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    FDACS( Florida Department of Agriculture and Consumer Services)
    Description

    PotSim is a large-scale simulated agricultural dataset specifically designed for AI-driven research on potato cultivation. This dataset is grounded in real-world crop management scenarios and extrapolated to approximately 4.9 million hypothetical crop management scenarios. It encompasses diverse factors including varying planting dates, fertilizer application rates and timings, irrigation strategies, and 24 years of weather data. The resulting dataset comprises over 675 million daily simulation records, offering an extensive and realistic framework for agricultural AI research.

  10. Z

    simulated datasets for evaluating polygenic detection methods

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tripathi, Devashish (2024). simulated datasets for evaluating polygenic detection methods [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12752104
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset authored and provided by
    Tripathi, Devashish
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains simulation files corresponding to a combination of each demographic model (1/2/3), environment (linear/quadratic), selection duration (200/400/600/800/1000), and simulation replicate(1-20). This resulted in 600 simulation files with 600 unique combinations of demographic models, environments, selection durations, and simulation replicates. For each individual in the genotype data file, we have the files containing the values of selective pressure(linear and quadratic environment) in the metadata folder.

    The variant position are 1-based which is default SLiM output. To compare the results with the causal loci user must make the positions 0-based (i.e. POS-1). The details are provided in a github tutorial.

    Please refer to the documentation for a detailed description of the files and folder structure.

    The article describing the simulated data and its application is accepted for publication in Nucleic Acids Research (https://doi.org/10.1093/nar/gkae1027).

  11. 4

    Simulation dataset for research project Metropolis 2

    • data.4tu.nl
    zip
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrei Badea; Andres Morfin Veytia; Joost Ellerbroek (2022). Simulation dataset for research project Metropolis 2 [Dataset]. http://doi.org/10.4121/19323263.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Andrei Badea; Andres Morfin Veytia; Joost Ellerbroek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    European Commission
    Description

    Data produced by simulating traffic scenarios using the BlueSky Open Air Traffic Simulator. The dataset was generated by applying three ATM operational concepts to urban airspace traffic scenarios: decentralised, hybrid and centralised.

    The dataset consists of logs of information gathered during the simulations.

  12. m

    Data from: Simulated Dataset

    • data.mendeley.com
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Pan (2024). Simulated Dataset [Dataset]. http://doi.org/10.17632/ts6cbgw9fg.1
    Explore at:
    Dataset updated
    Jul 22, 2024
    Authors
    Yang Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Simulated Dataset

  13. NODE simulated data

    • figshare.com
    zip
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zedong Wang (2025). NODE simulated data [Dataset]. http://doi.org/10.6084/m9.figshare.28252061.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zedong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NODE simulated dataset.

  14. Z

    Data from: Simulated Well Production Data using a Transient Well Model and a...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlHammad, Yousef K. (2023). Simulated Well Production Data using a Transient Well Model and a Developed Simulator [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8128888
    Explore at:
    Dataset updated
    Nov 17, 2023
    Dataset authored and provided by
    AlHammad, Yousef K.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a simulated dataset of transient well production data. This dataset was used in my Masters thesis at King Abullah University of Science and Technology (KAUST), and it is shared for academic use and research work.

    The dataset has 100 wells simulated at time steps of 0.2 hours for an entire year. This gives 43,800 observations per well, and grand total of 4,380,000 observations in the entire dataset. The resulting production data is then perturbed with systemic and random gauge errors to better simulate real-world gauge readings.

    The simulator code used to generate this dataset can be found at: https://github.com/ykh-1992/TransientNodalAnalysis.jl

    The data consists of three files: - "wells.csv": This file details the input parameters for each simulated well. - "data.zip": This file houses an 850 MB "data.csv" that includes the simulated well production data. - "auxiliary.csv": This file includes information related to the simulation run.

  15. d

    Simulated dataset from 'Quantifying the causal pathways contributing to...

    • datadryad.org
    • search.datacite.org
    zip
    Updated Sep 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Henshaw (2020). Simulated dataset from 'Quantifying the causal pathways contributing to natural selection' [Dataset]. http://doi.org/10.5061/dryad.j0zpc86c8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2020
    Dataset provided by
    Dryad
    Authors
    Jonathan Henshaw
    Time period covered
    2020
    Description

    The following files are included:

    • The code used to generate the dataset (written for Wolfram Mathematica version 12.1.0.0)

    • The code used to analyse the causal structure of selection in the dataset (written for R version 1.1.456).

  16. d

    Data from: Simulated dataset

    • dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tellaroli, Paola (2024). Simulated dataset [Dataset]. http://doi.org/10.7910/DVN/OLGPT6
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Tellaroli, Paola
    Description

    Simulated data with max variability cited in 'Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters'

  17. t

    Simulated Dataset for Testing - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Simulated Dataset for Testing - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/simulated-dataset-for-testing
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper is a simulated dataset for testing the proposed algorithms.

  18. i

    Normal Distribution Simulated Dataset 1

    • ieee-dataport.org
    Updated Apr 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabiriele Bulivou (2022). Normal Distribution Simulated Dataset 1 [Dataset]. https://ieee-dataport.org/documents/normal-distribution-simulated-dataset-1
    Explore at:
    Dataset updated
    Apr 25, 2022
    Authors
    Gabiriele Bulivou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of a simulated normal distribution data having n = 500 data points and mean = 80 and standard deviation = 2.

  19. CMAPSS Jet Engine Simulated Data - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Oct 15, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2008). CMAPSS Jet Engine Simulated Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/cmapss-jet-engine-simulated-data
    Explore at:
    Dataset updated
    Oct 15, 2008
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise. The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data. The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to: 1) unit number 2) time, in cycles 3) operational setting 1 4) operational setting 2 5) operational setting 3 6) sensor measurement 1 7) sensor measurement 2 ... 26) sensor measurement 26 Data Set: FD001 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: ONE (HPC Degradation) Data Set: FD002 Train trjectories: 260 Test trajectories: 259 Conditions: SIX Fault Modes: ONE (HPC Degradation) Data Set: FD003 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: TWO (HPC Degradation, Fan Degradation) Data Set: FD004 Train trjectories: 248 Test trajectories: 249 Conditions: SIX Fault Modes: TWO (HPC Degradation, Fan Degradation) Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.

  20. Simulated dataset for Olley&Pakes

    • kaggle.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Morozov (2023). Simulated dataset for Olley&Pakes [Dataset]. https://www.kaggle.com/datasets/antmorozov/simulated-dataset-for-olley-and-pakes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anton Morozov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anton Morozov

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Organization logo

Simulation Data Set

Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Search
Clear search
Close search
Google apps
Main menu