Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Facebook
TwitterThe data sets contain the parameters and configurations for the simulation defined in the scenario file as well as the simulation outputs for all numerical experiments presented in this contribution. In addition, the scripts for the evaluation of the results are provided. (ZIP)
Facebook
TwitterThis data includes a four alchemical processes with data files generated with the python package: generate_alchemical_lammps (DOI from MIDAS) and the resulting output to be used for the calculation of free energies using Multi-state Bennett Acceptance Ratio (MBAR), BAR, or Thermodynamic Integration (TI). These input files are only applicable for LAMMPS versions after April 2024. The four cases can be separated into two systems, benzene solvated in water, and a Lennard-Jones (LJ) dimer in solvent. These four cases are:1) benzene 1: In the NPT ensemble, scale the charges of benzene atoms from full values to zero over six steps.2) benzene 2: In the NPT ensemble, scale the van der Waals potential between benzene and water from full values to zero over sixteen steps.3) benzene 3: In the NVT ensemble with benzene in vacuum, scale the charges of benzene's atoms from zero to full values over six steps.4) lj_dimer: In the NPT ensemble, change the cross interaction energy parameter between solvent and dimer from a value of 1 to 2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Robotic Arm Simulation Dataset captures the performance and dynamics of a robotic arm tasked with moving various objects within a simulated environment. It includes variables such as object_id, object_position_x, object_position_y, target_position_x, target_position_y, action_taken, action_success, timestamp, distance_to_target, and object_type. Each entry records an action performed by the arm, indicating whether the action was successful. This dataset is essential for analyzing the effectiveness of robotic movements and improving the algorithms governing robotic actions, making it valuable for research in robotics and automation technologies.
Facebook
TwitterThe dataset used in the paper is a simulation dataset, which includes the speed and acceleration profiles of vehicles in different scenarios.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.
This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.
The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.
The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.
Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.
The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.
Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.
For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
Facebook
TwitterSummary of simulation data.
Facebook
TwitterThe main dataset is a 232 MB file of trajectory data (I395-final.csv) that contains position, speed, and acceleration data for non-automated passenger cars, trucks, buses, and automated vehicles on an expressway within an urban environment. Supporting files include an aerial reference image (I395_ref_image.png) and a list of polygon boundaries (I395_boundaries.csv) and associated images (I395_lane-1, I395_lane-2, …, I395_lane-6) stored in a folder titled “Annotation on Regions.zip” to map physical roadway segments to the numerical lane IDs referenced in the trajectory dataset. In the boundary file, columns “x1” to “x5” represent the horizontal pixel values in the reference image, with “x1” being the leftmost boundary line and “x5” being the rightmost boundary line, while the column "y" represents corresponding vertical pixel values. The origin point of the reference image is located at the top left corner. The dataset defines five lanes with five boundaries. Lane -6 corresponds to the area to the left of “x1”. Lane -5 corresponds to the area between “x1” and “x2”, and so forth to the rightmost lane, which is defined by the area to the right of “x5” (Lane -2). Lane -1 refers to vehicles that go onto the shoulder of the merging lane (Lane -2), which are manually separated by watching the videos. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which was one of the six collected as part of the TGSIM project, contains data collected from six 4K cameras mounted on tripods, positioned on three overpasses along I-395 in Washington, D.C. The cameras captured distinct segments of the highway, and their combined overlapping and non-overlapping footage resulted in a continuous trajectory for the entire section covering 0.5 km. This section covers a major weaving/mandatory lane-changing between L'Enfant Plaza and 4th Street SW, with three lanes in the eastbound direction and a major on-ramp on the left side. In addition to the on-ramp, the section covers an off-ramp on the right side. The expressway includes one diverging lane at the beginning of the section on the right side and one merging lane in the middle of the section on the left side. For the purposes of data extraction, the shoulder of the merging lane is also considered a travel lane since some vehicles illegally use it as an extended on-ramp to pass other drivers (see I395_ref_image.png for details). The cameras captured continuous footage during the morning rush hour (8:30 AM-10:30 AM ET) on a sunny day. During this period, vehicles equipped with SAE Level 2 automation were deployed to travel through the designated section to capture the impact of SAE Level 2-equipped vehicles on adjacent vehicles and their behavior in congested areas, particularly in complex merging sections. These vehicles are indicated in the dataset. As part of this dataset, the following files were provided: I395-final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I395_ref_image.png is the aerial reference image that defines the geographic region and the associated roadway segments. I395_boundaries.csv contains the coordinates that define the roadway segments (n=X). The columns "x1" to "x5" represent the horizontal pi
Facebook
Twitterppak10/simulation dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitter## Overview
Rescue Simulation is a dataset for object detection tasks - it contains Pets annotations for 2,600 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Facebook
TwitterSPECIAL NOTE: C-MAPSS and C-MAPSS40K ARE CURRENTLY UNAVAILABLE FOR DOWNLOAD. Glenn Research Center management is reviewing the availability requirements for these software packages. We are working with Center management to get the review completed and issues resolved in a timely manner. We will post updates on this website when the issues are resolved. We apologize for any inconvenience. Please contact Jonathan Litt, jonathan.s.litt@nasa.gov, if you have any questions in the meantime. Subject Area: Engine Health Description: This data set was generated with the C-MAPSS simulator. C-MAPSS stands for 'Commercial Modular Aero-Propulsion System Simulation' and it is a tool for the simulation of realistic large commercial turbofan engine data. Each flight is a combination of a series of flight conditions with a reasonable linear transition period to allow the engine to change from one flight condition to the next. The flight conditions are arranged to cover a typical ascent from sea level to 35K ft and descent back down to sea level. The fault was injected at a given time in one of the flights and persists throughout the remaining flights, effectively increasing the age of the engine. The intent is to identify which flight and when in the flight the fault occurred. How Data Was Acquired: The data provided is from a high fidelity system level engine simulation designed to simulate nominal and fault engine degradation over a series of flights. The simulated data was created with a Matlab Simulink tool called C-MAPSS. Sample Rates and Parameter Description: The flights are full flight recordings sampled at 1 Hz and consist of 30 engine and flight condition parameters. Each flight contains 7 unique flight conditions for an approximately 90 min flight including ascent to cruise at 35K ft and descent back to sea level. The parameters for each flight are the flight conditions, health indicators, measurement temperatures and pressure measurements. Faults/Anomalies: Faults arose from the inlet engine fan, the low pressure compressor, the high pressure compressor, the high pressure turbine and the low pressure turbine.
Facebook
TwitterThe main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day. As part of this dataset, the following files were provided: I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X. I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes. Annotation on Regions.zip, which includes images that visually map lanes (I90_9
Facebook
TwitterSimantha is a discrete event simulation package written in Python that is designed to model the behavior of discrete manufacturing systems. Specifically, it focuses on asynchronous production lines with finite buffers. It also provides functionality for modeling the degradation and maintenance of machines in these systems. Classes for five basic manufacturing objects are included: source, machine, buffer, sink, and maintainer. These objects can be defined by the user and configured in different ways to model various real-world manufacturing systems. The object classes are also designed to be extensible so that they can be used to model more complex processes.In addition to modeling the behavior of existing systems, Simantha is also intended for use with simulation-based optimization and planning applications. For instance, users may be interested in evaluating alternative maintenance policies for a particular system. Estimating the expected system performance under each candidate policy will require a large number of simulation replications when the system is subject to a high degree of stochasticity. Simantha therefore supports parallel simulation replications to make this procedure more efficient.Github repository: https://github.com/usnistgov/simantha
Facebook
TwitterThe reliable long-distance transmission of electromagnetic wave signals within goaf is fundamental for the implementation of wireless monitoring and early warning systems for goaf-related disasters. This paper establishes an experimental platform for electromagnetic wave signal transmission within goaf and develops a propagation model for electromagnetic waves in the porous media of goaf. The transmission characteristics of electromagnetic waves at various frequencies within the porous media environment of goaf are investigated through experimental and numerical simulation approaches. The results indicate that the received signal intensity of electromagnetic waves across different frequency bands diminishes with increasing propagation distance in the lossy environment of the goaf. Initially, the decay follows a logarithmic pattern, whereas, at later stages, the attenuation exhibits a gradual and smooth decrease. As the frequency increases, the initial attenuation amplitude of electromagnetic wave intensity rises; however, subsequent attenuation is largely unaffected by frequency, with the later attenuation rate being proportional to porosity. Electromagnetic waves at a frequency of 700 MHz exhibit a low attenuation coefficient under both experimental and simulated conditions, demonstrating superior stability and reliability. This frequency significantly enhances the overall performance of the communication system and is suitable for use as the operational frequency band in wireless sensor networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Saved simulation data of a passive tracer (a concentration field) that is advected-diffused by a turbulent flow in two dimensions. It is a single ensemble member (there were 50 in total). All the information to produce a time series and plots of the concentration field are saved.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data used to generate the figures and information used on paper "Countering Protection Rackets using Legal and Social Approaches: An Agent-Based Test"
Facebook
Twitter
According to our latest research, the global Engineering Simulation Data Management Software market size reached USD 1.85 billion in 2024, reflecting a robust trajectory driven by the increasing complexity of engineering projects and the need for seamless data integration. The market is poised to grow at a CAGR of 10.3% from 2025 to 2033, with the forecasted market size anticipated to touch USD 4.47 billion by 2033. The primary growth factor for this market is the rising adoption of digital transformation initiatives across industries, which has significantly increased the demand for advanced data management solutions that can handle the growing volume and complexity of simulation data.
A significant growth driver for the Engineering Simulation Data Management Software market is the escalating adoption of simulation-driven product development in industries such as automotive, aerospace, and healthcare. As product lifecycles shorten and the pressure for innovation intensifies, organizations are leveraging simulation tools to accelerate design, testing, and validation processes. This has led to a surge in the volume of simulation data, necessitating robust management platforms that can ensure data integrity, traceability, and accessibility across distributed teams. Additionally, the integration of simulation with other digital engineering tools has amplified the need for centralized data management, enabling organizations to achieve better collaboration, reduce redundancies, and maintain compliance with industry standards and regulations.
Another critical factor propelling market growth is the increasing complexity of engineering projects. Modern engineering simulations generate massive datasets that need to be managed efficiently for effective decision-making. The proliferation of multi-physics and multi-domain simulations, coupled with the trend towards digital twins and virtual prototyping, has further intensified the need for sophisticated data management solutions. Companies are now prioritizing the deployment of Engineering Simulation Data Management Software to streamline workflows, enhance productivity, and ensure that simulation data is readily available for analytics and reporting. This trend is particularly pronounced in sectors where safety, reliability, and regulatory compliance are paramount, such as aerospace & defense and healthcare.
The evolution of cloud computing and the shift towards cloud-based deployment models have also played a pivotal role in shaping the Engineering Simulation Data Management Software market. Cloud-based platforms offer unparalleled scalability, flexibility, and accessibility, making it easier for organizations to manage simulation data across global teams and locations. The ability to integrate with other enterprise systems, support for remote collaboration, and reduced IT overheads are some of the advantages driving the adoption of cloud-based solutions. This shift is enabling even small and medium enterprises to leverage advanced simulation data management capabilities, thereby democratizing access to cutting-edge engineering tools and fostering innovation across the value chain.
In the realm of engineering, Systems Engineering Software plays a pivotal role in managing the intricate web of processes involved in product development. This software is essential for coordinating various engineering disciplines, ensuring that all components of a system work harmoniously together. By integrating Systems Engineering Software with Engineering Simulation Data Management Software, organizations can enhance their ability to manage complex simulations and data flows. This integration facilitates better decision-making and improves the overall efficiency of engineering projects, particularly in industries where precision and reliability are critical.
Regionally, North America continues to dominate the Engineering Simulation Data Management Software market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The high concentration of technology-driven industries, presence of leading software vendors, and strong focus on R&D investments have contributed to the region's leadership. Europe, with its robust automotive and aerospace sectors, is also witnessing significant growth, wh
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains ISIMIP2b (https://www.isimip.org, Frieler et al. 2017) simulation data from six local lakes model: air2water4par/air2water6par (Piccolroaz et al. 2013, 2014, 2015, 2016, 2017, 2020), ALBM (Tan et al. 2015, 2016, 2018), FLake-IGB (Kirillin et al. 2011), MyLake (https://github.com/biogeochemistry/MyLake_public, Saloranta et al. 2007, Kiuru et al. 2019, Markelov et al. 2019), Simstrat (https://github.com/Eawag-AppliedSystemAnalysis/Simstrat, Goudsmit et al. 2003, Gaudard et al. 2019).
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).