Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents a general framework for simulating plot data in multi-environment field trials with one or more traits. The framework is embedded within the R package FieldSimR, whose core function generates plot errors that capture global field trend, local plot variation, and extraneous variation at a user-defined ratio. FieldSimR’s capacity to simulate realistic plot data makes it a flexible and powerful tool for a wide range of improvement processes in plant breeding, such as the optimisation of experimental designs and statistical analyses of multi-environment field trials. FieldSimR provides crucial functionality that is currently missing in other software for simulating plant breeding programmes and is available on CRAN. The paper includes an example simulation of field trials that evaluate 100 maize hybrids for two traits in three environments. To demonstrate FieldSimR’s value as an optimisation tool, the simulated data set is then used to compare several popular spatial models for their ability to accurately predict the hybrids’ genetic values and reliably estimate the variance parameters of interest. FieldSimR has broader applications to simulating data in other agricultural trials, such as glasshouse experiments.
Facebook
TwitterR code for simulating species data points for power calculations shown in Fig 3
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.
This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.
The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.
The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.
Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.
The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.
Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.
For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example computer code (R script) and associated data to run the Greater Glider simulation example in the manuscript.
Facebook
TwitterFile List SynthesiseSpectra2DBetaDetection1000.r -- r code for generating 2D data and determining variance spectra repeatedly from subsamples (n = 1000) Description The SynthesiseSpectra2DBetaDetection1000.r program uses r code to recover spectral exponents from 2D simulated data using repeated hierarchical analysis of variance over a range of nested spatial scales. The method for generating the simulated data using inverse Fourier transforms of 1/f β weighted random phases and numbers, and the returned β-values are given in Appendix B.
Facebook
TwitterThis dataset served as the input for the METS-R simulator. Data include the historical and predicted demand, cache of transit scheduling results, cache of candidate paths for routing, and link-level average speed and corresponding standard deviations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SimRFlow is a high-throughput physiologically based pharmacokinetic (PBPK) modelling tool which uses Certara’s Simcyp® simulator. The workflow is comprised of three main modules: 1) a Data Collection module for automated curation of physicochemical (from ChEMBL and the Norman Suspect List databases) and experimental data (i.e.: clearance, plasma-protein binding, and blood-to-plasma ratio, from httk-R package databases), 2) a Simulation module which activates the Simcyp® simulator and runs Monte Carlo simulations on virtual subjects using the curated data, and 3) a Data Visualisation module for understanding the simulated compound-specific profiles and predictions. SimRFlow has three administration routes (oral, intravenous, dermal) and allows users to change some simulation parameters including the number of subjects, simulation duration, and dosing. Users are only expected to provide a file of the compounds they wish to simulate, and in return the workflow provides summary statistics, concentration-time profiles of various tissue types, and a database file (containing in-depth results) for each simulated compound. This is presented within a guided and easy-to-use R Shiny interface which provides many plotting options for the visualisation of concentration-time profiles, parameter distributions, trends between the different parameters, as well as comparison of predicted parameters across all batch-simulated compounds. The in-built R functions can be assembled in user-customised scripts which allows for the modification of the workflow for different purposes. SimRFlow proves to be a time-efficient tool for simulating a large number of compounds without any manual curation of physicochemical or experimental data necessary to run Simcyp® simulations.
Facebook
TwitterA participatory live-coding lesson on simulating data and performing randomization tests on ecological data in R.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ecolisim_script.R, example R script to simulate expression profiles and the inference of network from the simulated profiles using BC3NET. (R 2 kb)
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This is the R package containing the WALRUS model code. WALRUS is a rainfall-runoff model for catchments with shallow groundwater.
This is version 1.11.
Facebook
TwitterPhylogenetic comparative biology has progressed considerably in recent years. One of the most important developments has been the application of likelihood-based methods to fit alternative models for trait evolution in a phylogenetic tree with branch lengths proportional to time. An important example of this type of method is O’Meara et al.’s (2006) “noncensored” test for variation in the evolutionary rate for a continuously valued character trait through time or across the branches of a phylogenetic tree. According to this method, we first hypothesize evolutionary rate regimes on the tree (called “painting” in Butler and King, 2004); and then we fit an evolutionary model, specifically the popular Brownian model, in which the instantaneous variance of the Brownian random diffusion process has different values in different parts of the phylogeny. The authors suggest that to test a hypothesis that the state of a discrete character influenced the rate of a continuous character, one could u...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code for creating the introduced visualizations and simulating the demonstration data.
Facebook
TwitterSummary This data release contains postprocessed model output from simulations of hypothetical rapid motion of landslides, subsequent wave generation, and wave propagation. A tsunami wave was generated with rapid motion of unstable material into Barry Arm Fjord, this wave then propagated through Prince William Sound, including into Passage Canal east of Whittier. Here we consider only the largest wave generating scenario presented by Barnhart and others (2021a, 2021b) and use a simulation setup similar to that work. The results presented here are not identical to those presented in Barnhart and others (2021a, 2021b) because the results presented here use an expanded dataset of topography and bathymetry. Model Description The simulation used the D-Claw model (George and Iverson, 2014; Iverson and George, 2014). D-Claw is a single layer model which simulates the coupled evolution of fluid and solid material while satisfying mass and momentum conservation. It is capable of simulating motion of landslide material, interaction of that material with water, tsunami generation, and wave propagation. Because the mobile material in D-Claw may be water, landslide material, or a mixture between the two, we will use the term "wave height” to refer to the altitude of the mobile material surface, regardless of its composition. Considered Scenario and Model Implementation We present results from a single scenario (Table 1). This scenario used the landslide source characteristics of the larger, contractive, more mobile scenario C-689 from Barnhart and others (2021a, 2021b) which generated the largest wave. In this scenario, three landslides on an unstable slope northwest of the northern portion Barry Arm fjord concurrently move into the fjord, generating a tsunami wave. The D-Claw model supports adaptive mesh refine¬ment, and like Barnhart and others (2021a, 2021b) we used a computational grid cell size of 50 m around the landslide and along the wave propagation path. Our implementation differs from this prior work in that we permitted grid refinement to a finer resolution of 1 m as the tsunami wave approached and inundated Whittier, Alaska. In the portions of the domain where no wave propagated, the cell size was permitted to remain at a coarse resolution of 1,000 m. The spatial extent of the simulation domain (shown in Figure 1) is smaller than what was considered by Barnhart and others (2021a, 2021b) and does not cover all of Prince William Sound. Instead, it is limited to the region between Whittier and Barry Arm. Additionally, the duration of simulated time was reduced relative to that presented in Barnhart and others (2021a, 2021b). Simulations ran for 35 minutes of simulated time, reflecting the arrival of the largest wave within the two hours of total simulated time presented by Barnhart and others (2021a, their Figures 7 and 8). Description of Inundation The simulation results show very little inundation at Whittier, Alaska. In the simulations, the parking area to the north of the City of Whittier Campground was inundated by less than 1 m of water. Along the sea wall defining and protecting the harbor, along Camp Road to the west of town and at the airstrip, simulated water levels rose to ~2 m above the MHHW level. At the harbor, the simulated wave did not propagate inland of the sea wall. Similarly, the simulated wave did not reach the elevation of Camp Road or the waiting area at the east portal of the Anton Anderson Memorial Tunnel. Finally, the simulated wave did not inundate the airstrip. Reference frame The horizontal reference frame for all files is North American Datum of 1983 (NAD 83) Universal Transverse Mercator (UTM) Zone 6 N (European Petroleum Survey Group Code 26906). The vertical reference frame is mean higher high water (MHHW) at Whittier, Alaska (NOAA Station 9454949). At this station, mean higher high water is defined as 3.395 m above the North American Vertical Datum of 1988. Elevation, altitude, and height, as used in this data release, refer to distance above the MHHW vertical datum. Topographic and Bathymetric Data Sources This work relied on integrating multiple topographic and bathymetric data sources. Where original data sources were not provided in the reference frame used in this work, datasets were reprojected to NAD 83 UTM Zone 6 N and translated to MHHW. In Barry Arm fjord north of Port Wells, we used a digital terrain model derived from subaerial light detection and ranging (lidar) data collected on June 26, 2020 (Daanen and others, 2021) and submarine bathymetric data collected between August 12 and 23 (NOAA, 2020). These data were combined at 5 m horizontal resolution. In Passage Canal west of 148.5º W—including at Whittier, Alaska—we use a 1 m topobathymetric dataset described by Haeussler and others (2013). This dataset combines a digital terrain model derived from lidar data collected between October 21 and 25, 2012 (Hubbard and others, 2013) with submarine multibeam data collected in 2011 (Haeussler and others, 2013), and digitized National Ocean Service (NOS) smooth sheet bathymetry for Survey H-10655 (NOAA, 1995). Elsewhere in the domain, we used the 8/3 arc-second dataset for Prince William Sound (NOAA, 2009a) and the 8/15 arc-second dataset for Whittier and Passage Canal (NOAA, 2009b). These data were projected into UTM coordinates at a resolution of 50 m for the Prince William Sound dataset and 10 m for the Whittier Dataset. These topographic and bathymetric data sources differ from those used by Barnhart and others (2021a, 2021b) in the addition of the 1 m resolution dataset used near Whittier, Alaska and in Passage Canal (green polygon in Figure 1). All other data sources were used by Barnhart and others (2021a, 2021b). Results Herein, we provide two model result files of spatially distributed model output in GeoTiff format and one polyline shapefile for scenario C-689 (Table 1). The shapefile delineates the boundary between model grid cells that were, and were not, inundated during the simulation. The two GeoTiff files contain the following variables: maximum wave height and the maximum inundation depth for model grid cells which started dry and were inundated by water or landslide material at some point in the simulation. The results are presented only in the region surrounding Whittier, Alaska and at a 1 m spatial resolution. The extent of the region surrounding Whittier, Alaska where results are provided is given in the east-west direction by eastings 405500—409700 and in the north-south direction by northings 6738600—6740500. The inundation extent is also shown in Figure 2. In addition, we provide time series from three numerical gages located between the junction of Passage Canal and Port Wells and the Whitter, Alaska harbor. Inundation extent The file “C689_1m_inund_extent.shp” is an ESRI Shapefile containing a polyline demarcating the boundary between areas which were inundated and areas which were not inundated. It was constructed by delineating the boundary between where “inundated_depth.tif” was greater than zero and where it was less than zero. Inundated depth The file “inundated_depth_meters.tif” contains the maximum inundation depth for grid cells which started dry but were inundated at some point in the simulation. The maximum inundation depth was calculated by analyzing model output at 15 second increments and identifying the maximum inundation depth over all output timesteps. Model grid cells which were never inundated or which started as ocean are indicated with “no data”. The file "maximum_wave_height_meters.tif" contains the maximum wave height. The wave height is given in meters relative to the vertical reference frame datum. The maximum wave height was calculated by analyzing model output at 15 second increments and identifying the maximum wave height over all output timesteps. Model grid cells which were never inundated by water or landslide material are indicated with "no data". The maximum wave height reflects the sum of the inundation depth and the grid cell elevation. Note that in grid cells which were initially dry but were inundated later, this value does not reflect the inundation depth. Wave height time series The file "whittier_passage_canal_gages.csv" contains simulated wave height time series for three locations where numerical gages were placed in the simulation (Figure 3). Latitude, longitude, easting, and northing coordinates for each of the three numerical gage locations are provided in Table 2. The file "gages.csv" contains four columns: 1. The first column "scenario" contains a string representing the scenario. In this case only one scenario was considered: "C689”. 2. The second column "gage_id" contains an integer referring to the gage ID number (1, 2, or 3) 3. The third column "time_seconds" contains an integer indicating the simulation time in seconds. 4. The fourth column "waveheight_meters" contains a floating-point number indicating the simulated wave height in meters above a reference datum. References Cited Barnhart, K.R., Jones, R.P., George, D.L., Coe, J.A., and Staley, D.M., 2021a, Preliminary assessment of the wave generating potential from landslides at Barry Arm, Prince William Sound, Alaska: U.S. Geological Survey Open-File Report 2021–1071, 28 p., accessed July 22, 2021 at https://doi.org/10.3133/ ofr20211071. Barnhart, K.R., Jones, R.P., George, D.L., Coe, J.A., Staley, D.A., 2021b, Select model results from simulations of hypothetical rapid failures of landslides into Barry Arm Fjord, Prince William Sound, Alaska: U.S. Geological Survey data release, accessed July 22, 2021 at https://doi.org/10.5066/P9XVJDNP. Daanen, R.P., Wolken, G.J., Wikstrom Jones, K., and Herbst, A.M., 2021, High resolution lidar-derived elevation data for Barry Arm landslide, southcentral Alaska, June 26, 2020: Alaska Division of Geological & Geophysical Surveys Raw Data File 2021–3, 9 p., accessed June 17,
Facebook
TwitterSSP (simulation-based sampling protocol) is an R package that uses simulations of ecological data and dissimilarity-based multivariate standard error (MultSE) as an estimator of precision to evaluate the adequacy of different sampling efforts for studies that will test hypothesis using permutational multivariate analysis of variance. The procedure consists in simulating several extensive data matrixes that mimic some of the relevant ecological features of the community of interest using a pilot data set. For each simulated data, several sampling efforts are repeatedly executed and MultSE calculated. The mean value, 0.025 and 0.975 quantiles of MultSE for each sampling effort across all simulated data are then estimated and standardized regarding the lowest sampling effort. The optimal sampling effort is identified as that in which the increase in sampling effort does not improve the highest MultSE beyond a threshold value (e.g. 2.5 %). The performance of SSP was validated using real dat...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes the source computer code and supporting data files for the predator-prey simulation model (parameterized for summer flounder, Paralichthys dentatus) developed to investigate bottom-up effects defined to be temporal pulses in prey abundance on predator growth, production, and fisheries management. The model is age-structured and spatially explicit to accommodate ontogenetic dietary changes and seasonal migrations, respectively. Three general prey groups were modeled and assumed to be small crustaceans, forage fish, and larger fish prey. The code was written in R by Andre Buchheister.
The dataset includes:
Source computer code (core simulation): PredPreySim.r
Source computer code (graphing results): PredPreySim_Graphing.r
Data file (stock recruitment): Stock_Recruitment.csv
Metadata file (simulation model parameter descriptions): Parameter_Categories.csv
Data file (summer flounder growth): LW_Age.csv
See http://www.vims.edu/research/departments/fisheries/programs/multispecies_fisheries_research/index.php for more information about growth data.
Related Publications:
Buchheister, A., M.J. Wilberg, T.J. Miller, and R.J. Latour. In press. Simulating bottom-up effects on predator productivity and consequences for the rebuilding timeline of a depleted population. Ecological Modelling.
Facebook
TwitterIn intraspecific agonistic interactions, it is expected that traits that are more important in determining the winning chances should exhibit greater differences between winners and losers than traits that are less important. However, several of the traits used to determine the winning chances are correlated. When these traits vary in their importance to win, it becomes hard (if not impossible) to disentangle an effect due to trait correlation from the true effect of each trait on winning. To test the impact of trait correlation on the relative importance of each trait on winning chances, we developed an individual-based simulation model that investigates how different values of trait correlation and the relative importance of each trait impact the expression of trait differences between winners and losers. The simulation was made in R and generates traits according to a normal distribution. Traits are correlated with each other through the function rnorm_multi. In each iteration, the v..., , , #### The stochastic model of agonistic interactions regarding code1.R
This is an individual-based simulation model, in which we establish scenarios combining different levels of correlation between attributes and their relative importance in determining Fighting Capacity. This model was used to investigate emerging patterns of differences between winners and losers for each attribute involved in the fight.
The correlated attributes used to exemplify the system were body size and weapon size.
These packages were used:
fauxdplyrtidyversebootsciplotggplot2ggthemesdevtoolsThe main function is called simulation.
Its parameters are:
cor.weapon.body: the value of the correlation between the traits weapon and bodyweapon.imp: the relative importance of the weapon size trait for the chance of winning a contestbody.imp: the relative importance of the body size...,
Facebook
TwitterThe GOES-R PLT Fly’s Eye GLM Simulator (FEGS) dataset consists of lightning flash, lightning pulse, and radiance data collected by the FEGS flown aboard a NASA ER-2 high-altitude aircraft during the GOES-R Post Launch Test (PLT) airborne science field campaign. The GOES-R PLT airborne science field campaign took place between March 21 and May 17, 2017 in support of the post-launch product validation of the Advanced Baseline Imager (ABI) and the Geostationary Lightning Mapper (GLM). These data files are available in ASCII format with browse imagery available in PNG format.
Facebook
Twittertree_constant_07-15-12R file for running the 'clade-constant' simulations as described in the paper.ntaxa_constant_07-01-12R script for the 'size-constant' simulations described in the paper.100extant_07-28-12R script file for simulations of extant only taxa.ratesVsResolved_anag_08-18-12R script for simulations with varying sampling and differentiation rates under a pure-anagenesis model.propDur_r0.01_09-15-12R script simulating clades under various models of differentiation and estimating the proportion of taxa with an observable, sampled duration.modeComparison_propDur_workspace_09-16-12An R workspace file containing all simulation data needed for plotting the figures where multiple models of differentiation are compared (Fig. 3-6, 8). Can be read into R with function load().ratesVresolved_08-16-12An R workspace file containing all simulation data needed for plotting the results of the simulations comparing sampling and differentiation rates (i.e. Fig. 7). Can be read into R with func...
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).