100+ datasets found
  1. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  2. Simulation data and code

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte de Vries; E Yagmur Erten (2022). Simulation data and code [Dataset]. http://doi.org/10.6084/m9.figshare.19232535.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Charlotte de Vries; E Yagmur Erten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
    • PF_simulation_data.zipcontains Simulation data to create figure 2 of de Vries, Erten and Kokko- Code_PF.zip contains C++ code to create the data used to create figure 2 (see PF_simulation_data.zip for the datafiles produced), and it also contains the R script to create figure 2 from the data (Figure2_cloud_25.R). All code files were created by Pen, I., & Flatt, T. (2021). Asymmetry, division of labour and the evolution of ageing in multicellular organisms. Philosophical Transactions of the Royal Society B, 376(1823), 20190729. C++ code is slightly adjusted to change output. Note that the R script takes a long time to run (multiple days on our laptops), and uses a lot of swap memory, we advice running it on a server. Alternatively, you can edit the code to use less than the last 25 days bychanging this line: ddead% filter(t>4975)to for example ddead% filter(t>4998)to use the last 2 time steps only. However, note that therewill be insufficient data at high ages to estimate mortality rates.
  3. Z

    Simulation Data & R scripts for: "Introducing recurrent events analyses to...

    • data.niaid.nih.gov
    • doi.org
    • +1more
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
    Explore at:
    Dataset updated
    Apr 29, 2024
    Dataset provided by
    Department of National Park Monitoring and Animal Management, Bavarian Forest National Park
    Authors
    Ferry, Nicolas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files descriptions:

    All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

    ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    "results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.

  4. n

    Data and code for: Generation and applications of simulated datasets to...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Silk; Olivier Gimenez (2023). Data and code for: Generation and applications of simulated datasets to integrate social network and demographic analyses [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp7s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    Centre d'Écologie Fonctionnelle et Évolutive
    Authors
    Matthew Silk; Olivier Gimenez
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.

  5. Call Centre Queue Simulation

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donovan Bangs (2022). Call Centre Queue Simulation [Dataset]. https://www.kaggle.com/datasets/donovanbangs/call-centre-queue-simulation
    Explore at:
    zip(841475 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    Donovan Bangs
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Call Centre Queue Simulation

    A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.

    This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.

    The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.

    The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.

    Discrete-Event Simulation

    Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.

    • Arrival Rate
    • Service Rate
    • Number of Agents

    The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.

    Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.

    For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.

  6. G

    Engineering Simulation Data Management Software Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Engineering Simulation Data Management Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/engineering-simulation-data-management-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Engineering Simulation Data Management Software Market Outlook



    According to our latest research, the global Engineering Simulation Data Management Software market size reached USD 1.85 billion in 2024, reflecting a robust trajectory driven by the increasing complexity of engineering projects and the need for seamless data integration. The market is poised to grow at a CAGR of 10.3% from 2025 to 2033, with the forecasted market size anticipated to touch USD 4.47 billion by 2033. The primary growth factor for this market is the rising adoption of digital transformation initiatives across industries, which has significantly increased the demand for advanced data management solutions that can handle the growing volume and complexity of simulation data.




    A significant growth driver for the Engineering Simulation Data Management Software market is the escalating adoption of simulation-driven product development in industries such as automotive, aerospace, and healthcare. As product lifecycles shorten and the pressure for innovation intensifies, organizations are leveraging simulation tools to accelerate design, testing, and validation processes. This has led to a surge in the volume of simulation data, necessitating robust management platforms that can ensure data integrity, traceability, and accessibility across distributed teams. Additionally, the integration of simulation with other digital engineering tools has amplified the need for centralized data management, enabling organizations to achieve better collaboration, reduce redundancies, and maintain compliance with industry standards and regulations.




    Another critical factor propelling market growth is the increasing complexity of engineering projects. Modern engineering simulations generate massive datasets that need to be managed efficiently for effective decision-making. The proliferation of multi-physics and multi-domain simulations, coupled with the trend towards digital twins and virtual prototyping, has further intensified the need for sophisticated data management solutions. Companies are now prioritizing the deployment of Engineering Simulation Data Management Software to streamline workflows, enhance productivity, and ensure that simulation data is readily available for analytics and reporting. This trend is particularly pronounced in sectors where safety, reliability, and regulatory compliance are paramount, such as aerospace & defense and healthcare.




    The evolution of cloud computing and the shift towards cloud-based deployment models have also played a pivotal role in shaping the Engineering Simulation Data Management Software market. Cloud-based platforms offer unparalleled scalability, flexibility, and accessibility, making it easier for organizations to manage simulation data across global teams and locations. The ability to integrate with other enterprise systems, support for remote collaboration, and reduced IT overheads are some of the advantages driving the adoption of cloud-based solutions. This shift is enabling even small and medium enterprises to leverage advanced simulation data management capabilities, thereby democratizing access to cutting-edge engineering tools and fostering innovation across the value chain.



    In the realm of engineering, Systems Engineering Software plays a pivotal role in managing the intricate web of processes involved in product development. This software is essential for coordinating various engineering disciplines, ensuring that all components of a system work harmoniously together. By integrating Systems Engineering Software with Engineering Simulation Data Management Software, organizations can enhance their ability to manage complex simulations and data flows. This integration facilitates better decision-making and improves the overall efficiency of engineering projects, particularly in industries where precision and reliability are critical.




    Regionally, North America continues to dominate the Engineering Simulation Data Management Software market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The high concentration of technology-driven industries, presence of leading software vendors, and strong focus on R&D investments have contributed to the region's leadership. Europe, with its robust automotive and aerospace sectors, is also witnessing significant growth, wh

  7. Call Center Simulated Data

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Sebastián Campos Ortiz (2023). Call Center Simulated Data [Dataset]. https://www.kaggle.com/datasets/scss17/call-center-simulated-data
    Explore at:
    zip(3098 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Pablo Sebastián Campos Ortiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.

    A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.

  8. T

    Simulation and Test Data Management Market Analysis - Size, Share, and...

    • futuremarketinsights.com
    html, pdf
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudip Saha (2025). Simulation and Test Data Management Market Analysis - Size, Share, and Forecast 2025 to 2035 [Dataset]. https://www.futuremarketinsights.com/reports/simulation-and-test-data-management-market
    Explore at:
    pdf, htmlAvailable download formats
    Dataset updated
    Jun 3, 2025
    Authors
    Sudip Saha
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2025 - 2035
    Area covered
    Worldwide
    Description

    The global simulation and test data management market is expected to witness substantial growth, with its valuation projected to increase from approximately USD 905.2 million in 2025 to about USD 3.24 billion by 2035. This corresponds to a CAGR of 12.1% over the forecast period.

    Attributes Description
    Industry Size (2025E)USD 905.2 million
    Industry Size (2035F)USD 3.24 billion
    CAGR (2025 to 2035)12.1% CAGR

    Category-wise Insights

    SegmentCAGR (2025 to 2035)
    Aerospace & Defense (Industry)14.8%
    SegmentValue Share ( 2025 )
    Test Data Simulation Software (Solution)42.3%
  9. H

    Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

    • dataverse.harvard.edu
    • dataone.org
    Updated Jul 6, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1

    Description

    User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.

  10. Data from: Optical scattering measurements and simulation data for...

    • data.nist.gov
    • datasets.ai
    • +2more
    Updated Jan 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2021). Optical scattering measurements and simulation data for one-dimensional (1-D) patterned periodic sub-wavelength features [Dataset]. http://doi.org/10.18434/mds2-2290
    Explore at:
    Dataset updated
    Jan 5, 2021
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    This data set consists of both measured and simulated optical intensities scattered off periodic line arrays, with simulations based upon an average geometric model for these lines. These data were generated in order to determine the average feature sizes based on optical scattering, which is an inverse problem for which solutions to the forward problem are calculated using electromagnetic simulations after a parameterization of the feature geometry. Here, the array of features measured and modeled is periodic in one-dimension (i.e., a line grating) with a nominal line width of 100 nm placed at 300 nm intervals, or pitch = 300 nm; the short-hand label for the features is "L100P300." The entirety of the modeled data is included, over two thousand simulations that are indexed using a top, middle, and bottom linewidth as floating parameters. Two subsets of these data, featuring differing sampling strategies, are also provided. This data set also contains angle-resolved optical measurements with uncertainties for nine arrays which differ in their dimensions due to lithographic variations using a focus/exposure matrix, as identified in a previous publication (https://doi.org/10.1117/12.777131). We have previously reported line widths determined from these measurements based upon non-linear regression to compare theory to experiment. Machine learning approaches are to be fostered for solving such inverse problems. Data are formatted for direct use in "Model-Based Optical Metrology in R: MoR" software which is also available from data.nist.gov. (https://doi.org/10.18434/T4/1426859). Note: Certain commercial materials are identified in this dataset in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials are necessarily the best available for the purpose.

  11. Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-90-i-94-moving-trajectories
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Area covered
    Interstate 90, Interstate 94, Interstate 90
    Description

    The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day. As part of this dataset, the following files were provided: I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X. I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes. Annotation on Regions.zip, which includes images that visually map lanes (I90_9

  12. i

    Data from: ISIMIP2b Simulation Data from the Local Lakes Sector

    • data.isimip.org
    • researchdata.edu.au
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Marcé; Donald Pierson; Daniel Mercado-Bettin; Wim Thiery; Sebastiano Piccolroaz; Bronwyn Woodward; Richard Iestyn Woolway; Zeli Tan; Georgiy Kirillin; Tom Shatwell; Raoul-Marie Couture; Marianne Côté; Damien Bouffard; Carl Love Mikael Råman Vinnå; Martin Schmid; Jacob Schewe (2022). ISIMIP2b Simulation Data from the Local Lakes Sector [Dataset]. http://doi.org/10.48364/ISIMIP.563533
    Explore at:
    Dataset updated
    Jun 10, 2022
    Dataset provided by
    ISIMIP Repository
    Authors
    Rafael Marcé; Donald Pierson; Daniel Mercado-Bettin; Wim Thiery; Sebastiano Piccolroaz; Bronwyn Woodward; Richard Iestyn Woolway; Zeli Tan; Georgiy Kirillin; Tom Shatwell; Raoul-Marie Couture; Marianne Côté; Damien Bouffard; Carl Love Mikael Råman Vinnå; Martin Schmid; Jacob Schewe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains ISIMIP2b (https://www.isimip.org, Frieler et al. 2017) simulation data from six local lakes model: air2water4par/air2water6par (Piccolroaz et al. 2013, 2014, 2015, 2016, 2017, 2020), ALBM (Tan et al. 2015, 2016, 2018), FLake-IGB (Kirillin et al. 2011), MyLake (https://github.com/biogeochemistry/MyLake_public, Saloranta et al. 2007, Kiuru et al. 2019, Markelov et al. 2019), Simstrat (https://github.com/Eawag-AppliedSystemAnalysis/Simstrat, Goudsmit et al. 2003, Gaudard et al. 2019).

  13. f

    Simulation data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hofinger, Gesine; Köster, Gerta; Rahn, Simon; Gödel, Marion (2022). Simulation data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000239732
    Explore at:
    Dataset updated
    Aug 30, 2022
    Authors
    Hofinger, Gesine; Köster, Gerta; Rahn, Simon; Gödel, Marion
    Description

    The data sets contain the parameters and configurations for the simulation defined in the scenario file as well as the simulation outputs for all numerical experiments presented in this contribution. In addition, the scripts for the evaluation of the results are provided. (ZIP)

  14. Slivisu: A visual analytics tool to validate simulation models against...

    • dataservices.gfz-potsdam.de
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Unger; Daniela Rabe; Volker Klemann; Daniel Eggert; Doris Dransch; Andrea Unger; Daniela Rabe; Doris Dransch (2018). Slivisu: A visual analytics tool to validate simulation models against collected data [Dataset]. http://doi.org/10.5880/gfz.1.5.2018.007
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    GFZ Data Services
    Authors
    Andrea Unger; Daniela Rabe; Volker Klemann; Daniel Eggert; Doris Dransch; Andrea Unger; Daniela Rabe; Doris Dransch
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    The validation of a simulation model is a crucial task in model development. It involves the comparison of simulation data to observation data and the identification of suitable model parameters. SLIVISU is a Visual Analytics framework that enables geoscientists to perform these tasks for observation data that is sparse and uncertain. Primarily, SLIVISU was designed to evaluate sea level indicators, which are geological or archaeological samples supporting the reconstruction of former sea level over the last ten thousands of years and are compiled in a postgreSQL database system. At the same time, the software aims at supporting the validation of numerical sea-level reconstructions against this data by means of visual analytics.

  15. E

    Simulation data: Experiments in Globalisation, Food Security and Land Use...

    • find.data.gov.scot
    txt, zip
    Updated Nov 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Geosciences (2014). Simulation data: Experiments in Globalisation, Food Security and Land Use Decision Making [Dataset]. http://doi.org/10.7488/ds/164
    Explore at:
    txt(0.0197 MB), zip(2765.824 MB)Available download formats
    Dataset updated
    Nov 6, 2014
    Dataset provided by
    University of Edinburgh. School of Geosciences
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zip file contains all simulation data produced for the publication Calum Brown, Dave Murray-Rust, Jasper van Vliet, Shah Jamal Alam, Peter H Verburg, Mark D Rounsevell (2014) Experiments in globalisation, food security and land use decision making. Once unzipped, the folder has the following structure: every sub-experiment (regionalisation) of the 19 experiments described in the text has its own folder. Within each, 30 simulations of each sub-experiment has a further folder and within each of these, csv files give the coordinates of modelled cells, capital values for each cell, agent type owning each cell, the individual agent's competitiveness on that cell, and the quantities of services produced. There is one csv file for each timestep of the simulation. All analyses in the above publication were based on this data. For any enquiries, please contact Calum Brown: calum.brown@ed.ac.uk.

  16. O

    Reverse Osmosis Simulation Data

    • data.openei.org
    • osti.gov
    • +1more
    archive, website
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sreejith Nadakkal Appukuttan; Hariswaran Sitaraman; Hilary Egan; Sreejith Nadakkal Appukuttan; Hariswaran Sitaraman; Hilary Egan (2024). Reverse Osmosis Simulation Data [Dataset]. http://doi.org/10.7481/2478402
    Explore at:
    website, archiveAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    National Renewable Energy Lab - NREL
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Open Energy Data Initiative (OEDI)
    Authors
    Sreejith Nadakkal Appukuttan; Hariswaran Sitaraman; Hilary Egan; Sreejith Nadakkal Appukuttan; Hariswaran Sitaraman; Hilary Egan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of computational fluid dynamics (CFD) output for various spacer configurations in a feed-water channel in reverse osmosis (RO) applications. Feed-water channels transport brine solution to the RO membrane surfaces. The spacers embedded in the channels help improve membrane performance by disrupting the concentration boundary layer growth on membrane surfaces. Refer to the "Related Work" resource below for more details. This dataset considers a feed-water channel of length 150mm. The inlet brine velocity and concentration are fixed at 0.1m/s and 100kg/m3 respectively. The diameter of the cylindrical spacers is fixed as 0.3mm and six varying inter-spacer distances of 0.75mm, 1mm, 1.5mm, 2mm, 2.5mm, and 3mm are simulated. The dataset comprising the steady, spatial fields of solute concentration, velocity, and density near each spacer is placed in the folder corresponding to the spacer configuration considered. We run two sets of CFD simulations and include the outputs from both sets for each configuration: (1) with a coarser mesh, producing low-resolution (LR) data of spatial resolution 20x20, and (2) with a finer mesh, producing high-resolution (HR) data of spatial resolution 100x100. These data points can be treated as images with the quantities of interest as their channels and can be used to train machine learning models to learn a mapping from the LR images as inputs to the HR images as outputs.

  17. LAMMPS Simulation Data of Alchemical Processes

    • nist.gov
    • data.nist.gov
    • +1more
    Updated Nov 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). LAMMPS Simulation Data of Alchemical Processes [Dataset]. http://doi.org/10.18434/mds2-3637
    Explore at:
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    This data includes a four alchemical processes with data files generated with the python package: generate_alchemical_lammps (DOI from MIDAS) and the resulting output to be used for the calculation of free energies using Multi-state Bennett Acceptance Ratio (MBAR), BAR, or Thermodynamic Integration (TI). These input files are only applicable for LAMMPS versions after April 2024. The four cases can be separated into two systems, benzene solvated in water, and a Lennard-Jones (LJ) dimer in solvent. These four cases are: 1) benzene 1: In the NPT ensemble, scale the charges of benzene atoms from full values to zero over six steps. 2) benzene 2: In the NPT ensemble, scale the van der Waals potential between benzene and water from full values to zero over sixteen steps. 3) benzene 3: In the NVT ensemble with benzene in vacuum, scale the charges of benzene's atoms from zero to full values over six steps. 4) lj_dimer: In the NPT ensemble, change the cross interaction energy parameter between solvent and dimer from a value of 1 to 2.

  18. f

    FDTD simulation data for Optical Diffraction Tomography

    • figshare.com
    hdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Müller; Mirjam Schürmann; Jochen Guck (2023). FDTD simulation data for Optical Diffraction Tomography [Dataset]. http://doi.org/10.6084/m9.figshare.19111055.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Paul Müller; Mirjam Schürmann; Jochen Guck
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains 2D and 3D complex field sinograms for optical diffraction tomography created with finite-difference time domain simulations using Meep (http://ab-initio.mit.edu/wiki/index.php?title=Meep). The entire complex electric field data from the simulations would have amounted to >2TB of data which is not easy to share and also contain a lot of redundant and uninteresting (for tomography at least) information. Most of these data were used in the ODTbrain paper (https://dx.doi.org/10.1186/s12859-015-0764-0).

    Data Structure Each dataset is an HDF5 file (https://www.hdfgroup.org) that contains the simulation structure (the cell phantom), a background field (simulation without phantom), and a field for each rotational position of the phantom (sinogram). The fields are slices through the complex electrical field in the original simulation volume behind the phantom at the end of the simulation (supposedly steady state). The slice position is written as the HDF5 attribute “extraction focus distance [px]”. The slice position is important for the reconstruction, because the fields must be numerically refocused to the center of the simulation volume before reconstruction. The perfectly matched layer (PML) has already been cropped from the fields. Alongside each field, the source code of the Meep simulation and the standard-output of the compiled simulation are stored. You can also find the simulation templates in the ODTbrain repository at https://github.com/RI-imaging/ODTbrain/tree/master/misc. I recommend you to explore the files using HDFView (https://www.hdfgroup.org/downloads/hdfview/).

    Naming Scheme I adopted the naming scheme of the original simulations. - The first part of the file name determines the dimension of the simulation. The larger “phantom_3d” files contain the 3D simulation sinograms. - “A” is the total number of angles for which simulations were performed. - “R” is the resolution (number of pixels per wavelength). - “T” is the total number of simulation steps performed. - “Nmed” is the refractive index (RI) of the medium surrounding the cell phantom. - “Ncyt” is the RI of the phantom’s cytoplasm. - “Nnuc” is the RI of the phantom’s nucleus. - “Nleo” is the RI of the phantom’s nucleolus. The final part of the file name indicates to which type of study the simulation belongs: - “angles”: varying the total number of acquisition angles - “step-count”: varying the total number of time steps - “refractive-index”: varying the internal RI values of the cell phantom - “size”: varying the size of the phantom

    Getting Started I added two Python scripts “recon_2d.py” and “recon_3d.py” (tested with Python 3.9 on Ubuntu 22.04) that will allow you to obtain RI reconstructions from the 2D and 3D sinograms. For this to work, you will have to install the Python libraries imported in those scripts. Note that for the 3D data you can also use the graphical tool CellReel (https://github.com/RI-imaging/CellReel).

  19. f

    Laurel and Hardy 2 - simulation data (.mat file)

    • fairdomhub.org
    application/matlab
    Updated Mar 6, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Seaton (2017). Laurel and Hardy 2 - simulation data (.mat file) [Dataset]. https://fairdomhub.org/data_files/1608
    Explore at:
    application/matlab(336 KB)Available download formats
    Dataset updated
    Mar 6, 2017
    Authors
    Daniel Seaton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Simulation data for Laurel and Hardy 2, in MATLAB binary format.

  20. k

    Data from: ISIMIP2a Simulation Data from Water (global) Sector

    • dataon.kisti.re.kr
    • dataservices.gfz-potsdam.de
    Updated Jan 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gosling, Simon;Müller Schmied, Hannes;Betts, Richard;Chang, Jinfeng;Ciais, Philippe;Dankers, Rutger;Döll, Petra;Eisner, Stephanie;Flörke, Martina;Gerten, Dieter;Grillakis, Manolis;Hanasaki, Naota;Hagemann, Stefan;Huang, Maoyi;Huang, Zhongwei;Jerez, Sonia;Kim, Hyungjun;Koutroulis, Aristeidis;Leng, Guoyong;Liu, Xingcai;Masaki, Yoshimitsu;Montavez, Pedro;Morfopoulos, Catherine;Oki, Taikan;Papadimitriou, Lamprini;Pokhrel, Yadu;Portmann, Felix T.;Orth, Rene;Ostberg, Sebastian;Satoh, Yusuke;Seneviratne, Sonia;Sommer, Philipp;Stacke, Tobias;Tang, Qiuhong;Tsanis, Ioannis;Wada, Yoshihide;Zhou, Tian;Büchner, Matthias;Schewe, Jacob;Zhao, Fang (2017). ISIMIP2a Simulation Data from Water (global) Sector [Dataset]. https://dataon.kisti.re.kr/search/93aef4247c45f0395db8dc09ce507455
    Explore at:
    Dataset updated
    Jan 1, 2017
    Authors
    Gosling, Simon;Müller Schmied, Hannes;Betts, Richard;Chang, Jinfeng;Ciais, Philippe;Dankers, Rutger;Döll, Petra;Eisner, Stephanie;Flörke, Martina;Gerten, Dieter;Grillakis, Manolis;Hanasaki, Naota;Hagemann, Stefan;Huang, Maoyi;Huang, Zhongwei;Jerez, Sonia;Kim, Hyungjun;Koutroulis, Aristeidis;Leng, Guoyong;Liu, Xingcai;Masaki, Yoshimitsu;Montavez, Pedro;Morfopoulos, Catherine;Oki, Taikan;Papadimitriou, Lamprini;Pokhrel, Yadu;Portmann, Felix T.;Orth, Rene;Ostberg, Sebastian;Satoh, Yusuke;Seneviratne, Sonia;Sommer, Philipp;Stacke, Tobias;Tang, Qiuhong;Tsanis, Ioannis;Wada, Yoshihide;Zhou, Tian;Büchner, Matthias;Schewe, Jacob;Zhao, Fang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) provides a framework for the collation of a set of consistent, multi-sector, multi-scale climate-impact simulations, based on scientifically and politically-relevant historical and future scenarios. This framework serves as a basis for robust projections of climate impacts, as well as facilitating model evaluation and improvement, allowing for improved estimates of the biophysical and socio-economic impacts of climate change at different levels of global warming. It also provides a unique opportunity to consider interactions between climate change impacts across sectors. ISIMIP2a is the second ISIMIP simulation round, focusing on historical simulations (1971-2010 approx.) of climate impacts on agriculture, fisheries, permafrost, biomes, regional and global water and forests. This may serve as a basis for model evaluation and improvement, allowing for improved estimates of the biophysical and socio-economic impacts of climate change at different levels of global warming. The focus topic for ISIMIP2a is model evaluation and validation, in particular with respect to the representation of impacts of extreme weather events and climate variability. During this phase, four common global observational climate data sets were provided across all impact models and sectors. In addition, appropriate observational data sets of impacts for each sector were collected, against which the models can be benchmarked. Access to the input data for the impact models is provided through a central ISIMIP archive (see https://www.isimip.org/gettingstarted/#input-data-bias-correction). This entry refers to the ISIMIP2a simulation data from global hydrology models: CLM4, DBH, H08, JULES_W1, JULES_B1, LPJmL, MATSIRO, MPI-HM, ORCHIDEE, PCR-GLOBWB, SWBM, VIC, WaterGAP2.;The ISIMIP2a water (global) outputs are based on simulations from 13 global hydrology models (see listing) according to the ISIMIP2a protocol (https://www.isimip.org/protocol/#isimip2a). The models simulate hydrological processes and dynamics (part of the models also considering human water abstractions and reservoir regulation) based on climate and physio-geographical information. A more detailed description of the models and model-specific amendments of the protocol are available here: https://www.isimip.org/impactmodels/.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Organization logo

Simulation Data Set

Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Search
Clear search
Close search
Google apps
Main menu