100+ datasets found
  1. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  2. Slivisu: A visual analytics tool to validate simulation models against...

    • dataservices.gfz-potsdam.de
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Unger; Daniela Rabe; Volker Klemann; Daniel Eggert; Doris Dransch; Andrea Unger; Daniela Rabe; Doris Dransch (2018). Slivisu: A visual analytics tool to validate simulation models against collected data [Dataset]. http://doi.org/10.5880/gfz.1.5.2018.007
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    GFZ Data Services
    Authors
    Andrea Unger; Daniela Rabe; Volker Klemann; Daniel Eggert; Doris Dransch; Andrea Unger; Daniela Rabe; Doris Dransch
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    The validation of a simulation model is a crucial task in model development. It involves the comparison of simulation data to observation data and the identification of suitable model parameters. SLIVISU is a Visual Analytics framework that enables geoscientists to perform these tasks for observation data that is sparse and uncertain. Primarily, SLIVISU was designed to evaluate sea level indicators, which are geological or archaeological samples supporting the reconstruction of former sea level over the last ten thousands of years and are compiled in a postgreSQL database system. At the same time, the software aims at supporting the validation of numerical sea-level reconstructions against this data by means of visual analytics.

  3. f

    Laurel and Hardy 1 mean data and simulations

    • fairdomhub.org
    xlsx
    Updated Apr 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Seaton; Yin Hoon Chew; Virginie Mengin (2022). Laurel and Hardy 1 mean data and simulations [Dataset]. https://fairdomhub.org/data_files/5002
    Explore at:
    xlsx(859 KB)Available download formats
    Dataset updated
    Apr 8, 2022
    Authors
    Daniel Seaton; Yin Hoon Chew; Virginie Mengin
    License

    https://spdx.org/licenses/CC0-1.0https://spdx.org/licenses/CC0-1.0

    Description

    Excel spreadsheet with data and simulations used to prepare figures for publication, see Metadata sheet for conditions. Data Fresh (not dry) rosette leaf biomass, measured in samples of 5 plants each on multiple days, as mean and SD; Simulation outputs from FMv2 for Col Wild Type plants, lsf1, and two simulations for prr7prr9 where the mutation affects only starch degradation or both starch degradation and malate/fumarate store mobilisation.

    Starch levels in carbon units (not C6) measured on on days 27-28, mean and SD, simulations as above Malate and fumarate levels in carbon units (not C4) measured on days 27-28, mean and SD, simulations as above Many simulation outputs from FMv2 runs in the conditions above, from the Matlab output file

  4. Z

    Simulation data and code for "Optimal Rejection-Free Path Sampling"

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lazzeri, Gianmarco (2025). Simulation data and code for "Optimal Rejection-Free Path Sampling" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14922167
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Goethe University Frankfurt
    Authors
    Lazzeri, Gianmarco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.

    Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:

    all the WQ runs: 10.5281/zenodo.14830317chignolin, fps0: 10.5281/zenodo.14826023chignolin, fps1: 10.5281/zenodo.14830200chignolin, fps2: 10.5281/zenodo.14830224chignolin, tps0: 10.5281/zenodo.14830251chignolin, tps1: 10.5281/zenodo.14830270chignolin, tps2: 10.5281/zenodo.14830280

    The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.

    To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.

    Data structure and content

    analysis (code for analyzing the data and generating the figures of the| paper)|- figures.ipynb (Jupyter notebook for the analysis)|- figures (the figures created by the Jupyter notebook) |- ...

    data (all the AIMMD and reference runs, plus general info about the| simulated systems)|- chignolin |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm; see the "src" folder below) |- run.gro (full system positions in the native conformation) |- mol.pdb (only the peptide positions in the native conformation) |- topol.top (the system's topology for the GROMACS MD engine) |- charmmm22star.ff (force field parameter files) |- run.mdp (GROMACS MD parameters when appending a simulation) |- randomvelocities.mdp (GROMACS MD parameters when initializing a | simulation with random velocities) |- signature.npy, r0.npy (parameters for defining the fraction of native | contacts involved in the folded/unfolded states | definition; used by params.py function | "states_function") |- dmax.npy, dmin.npy (parameters for defining the feature representation | of the AIMMD NN model; used by params.py | function "descriptors_function") |- equilibrium (reference long equilibrium trajectory files; only the | peptide positions are saved!) |- run0.xtc, ..., run3.xtc |- validation |- validation.xtc (the validation SPs all together in an XTC file) |- validation.npy (for each SP, collects the cumulative shooting results after 10 two-way shooting simulations) |- fps0 (the first AIMMD-RFPS independent run) |- equilibriumA (the free simulations around A, already processed | in PathEnsemble files) |- traj000001.h5 |- traj000001.tpr (for running the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000001.cpt (for appending the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000002.h5 (in case of re-initialization) |- ... |- equilibriumB (the free simulations around B, ...) |- ... |- shots0 |- chain.h5 (the path sampling chain) |- pool.h5 (the selection pool, containing the frames from which | shooting points are currently selected from) |- params.py (file containing the states and descriptors definitions, | the NN fit function, and the AIMMD runs hyperparameters; | if can be modified to allow for RFPS-AIMMD or the original | algorithm AIMMD runs) |- initial.trr (the initial transition for path sampling) |- manager.log (reports info about the run) |- network.h5 (NN weights of the model at different path | sampling steps) |- fps1, fps2 (the other RFPS-AIMMD runs) |- tps0 (the first AIMMD-TPS, or "standard" AIMMD, run) |- ... |- shots0 |- ... |- chain_weights.npy (weights of the trials in TPS; only the trials | with non zero weight had been accepted) |- tps1, tps2 (the other AIMMD runs, with TPS for the shooting simulations)|- wq (Wolfe-Quapp 2D system) |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm) |- run.gro (dummy gro file produced for compatibility reasons) |- integrator.py (custom MD engine) |- equilibrium (reference long simulation) |- transition000001.xtc (extracted from reference long simulation) |- transition000002.xtc |- ... |- transitions.h5 (PathEnsemble file with all the transitions) |- reference |- grid_X.npy, grid_Y.npy (X, Y grid for 2D plots) |- grid_V.npy (PES projected on the grid) |- grid_committor_relaxation.npy (true committor on the grid solved | with the relaxation method on the | backward Kolmogorov equation; the | code for doing this is in utils.py) |- grid_boltzmann_distribution.npy (Boltzmann distribution on the grid) |- pe.h5 (equilibrium distribution processed as a PathEnsemble file) |- tpe.h5 (TPE distribution processed as a PathEnsemble file) |- ... |- uniform_tps (reference TPS run with uniform SP selection) |- chain.h5 (PathEnsemble file containin all the accepted paths | with their correct weight) |- fps0, ..., fps9 (the independent AIMMD-RFPS runs) |- ... |- tps0, ..., tps9 (the independent AIMMD-TPS, or "standard" AIMMD runs)

    src (code for generating/appending AIMMD runs on a Workstation or HPC| cluster via Slurm)|- generate.py (on a Workstation: initializes the processes; on an HPC| cluster: creates the sh file for submitting a job)|- slurm_options.py (to customize and use in case of running on HPC)|- manager.py (controls SP selection; reweights the paths)|- shooter.py (performs path sampling simulations)|- equilibrium.py (performs free simulations)|- pathensemble.py (code of the PathEnsemble class)|- utils.py (auxiliary functions for data production and analysis)

    Running/appending AIMMD runs

    • To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:
    1. Create a "run directory" folder (same depth as "fps0")

    2. Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.

    3. (On a Workstation) call:

    python generate.py

    where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.

    1. (On a HPC cluster) call:

    python generate.py -s slurm_options.pysbatch ._job.sh

    • To append to an existing RFPS-AIMMD or AIMMD run
    1. Merge the supplemental repository with the trajectory files into this one.

    2. Just call again (on a Workstation)

    python generate.py

    or (on a HPC cluster)

    sbatch ._job.sh

    after updating the "nsteps" parameters.

    • To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.

    Reproducing the analysis

    Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.

  5. Z

    Data from: Confronting Large-Eddy Simulations with Stereo Camera Data by...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Mar 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burchart, Yannick; Pospichal, Bernhard; Neggers, Roel (2025). Confronting Large-Eddy Simulations with Stereo Camera Data by means of reconstructed hemispheric Cloud Size Distributions [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14000561
    Explore at:
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    University of Cologne
    Authors
    Burchart, Yannick; Pospichal, Bernhard; Neggers, Roel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset to produce the results of the publication: "Confronting Large-Eddy Simulations with Stereo Camera Data by means of reconstructed hemispheric Cloud Size Distributions". This dataset supports the findings presented in the publication and includes comprehensive resources for replicating its analysis and visualization.

    The dataset encompasses:

    Dutch Atmospheric Large-Eddy Simulation (DALES) Data

    Configuration files

    Selected simulation output data

    Image Data

    Rendered stereo camera images from the DALES output

    Actual stereo camera images

    Cloud masks generated from these images

    Camera-Based Reconstructions

    Reconstructed cloud fields from the rendered camera images

    Reconstructed cloud fields from the actual camera images

    Derived Cloud Metrics

    Cloud base areas, cloud base heights, and cloud cover from the camera-based reconstructions

    Observational Data

    Radiosondes, Ceilometer, and Cloudnet measurements

    Cloud cover from radiation measurements

    Mixed layer height from the Doppler lidar

    Reproduction Scripts

    Scripts to reproduce the analysis and figures

    This research was supported by the U.S. Department of Energy's Atmospheric System Research, an Office of Science Biological and Environmental Research program, under grant DE-SC0022126 and by the German Research Foundation (DFG) under project number 430226822 (https://gepris.dfg.de/gepris/projekt/430226822). The Gauss Centre for Supercomputing e.V. (https://www.gauss-centre.eu/) is acknowledged for providing computing time on the Gauss Centre for Supercomputing (GCS) supercomputer JUWELS at the Jülich Supercomputing Centre (JSC) under the projects RCONGM and VIRTUALLAB. JOYCE data were provided by the Institute for Geophysics and Meteorology of the University of Cologne. JOYCE is a collaborative research platform between University of Cologne and Forschungszentrum Jülich within the European research infrastructure ACTRIS. We acknowledge ACTRIS and the Finnish Meteorological Institute for providing Cloudnet data which is available for download from https://cloudnet.fmi.fi. We acknowledge ECMWF for providing IFS model data.

  6. Gateway for Co-Simulation using ns-3

    • data.nist.gov
    • catalog.data.gov
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Roth (2025). Gateway for Co-Simulation using ns-3 [Dataset]. http://doi.org/10.18434/mds2-3738
    Explore at:
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Thomas Roth
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The Internet of Things (IoT) is comprised of networks of physical, computational, and human components that coordinate to fulfill time-sensitive functions in a shared operating environment. Development and testing of IoT systems often utilizes modeling and simulation, whether to analyze potential performance gains of new technologies or develop robust digital twins to support future operations and maintenance. However, the complexity and scale of IoT means that individual simulators are often inadequate to simulate the real-world dynamics of such systems, and simulators must be combined with other software or hardware. The National Institute of Standards and Technology (NIST) has developed a software module that extends the ns-3 network simulator with a new capability to communicate with external software and hardware at runtime. This software facilitates the development of co-simulations where ns-3 models can synchronize and exchange data with external processes to develop higher-fidelity simulations. The software is open-source and available on the NIST GitHub.

  7. n

    Monthly mean climate data from a transient simulation with the Whole...

    • data-search.nerc.ac.uk
    • catalogue.ceda.ac.uk
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Monthly mean climate data from a transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 2015 to 2070 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?orgName=NERC%20EDS%20Centre%20for%20Environmental%20Data%20Analysis
    Explore at:
    Dataset updated
    Aug 20, 2021
    Description

    This dataset comprises monthly mean data from a global, transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 2015 to 2070. WACCM-X is a global atmosphere model covering altitudes from the surface up to ~500 km, i.e., including the troposphere, stratosphere, mesosphere and thermosphere. WACCM-X version 2.0 (Liu et al., 2018) was used, part of the Community Earth System Model (CESM) release 2.1.0 (http://www.cesm.ucar.edu/models/cesm2) made available by the National Center for Atmospheric Research. The model was run in free-running mode with a horizontal resolution of 1.9 degrees latitude and 2.5 degrees longitude (giving 96 latitude points and 144 longitude points) and 126 vertical levels. Further description of the model and simulation setup is provided by Cnossen (2022) and references therein. A large number of variables is included on standard monthly mean output files on the model grid, while selected variables are also offered interpolated to a constant height grid or vertically integrated in height (details below). Zonal mean and global mean output files are included as well. The data are provided in NetCDF format and file names have the following structure: f.e210.FXHIST.f19_f19.h1a.cam.h0.[YYYY]-[MM][DFT].nc where [YYYY] gives the year with 4 digits, [MM] gives the month (2 digits) and [DFT] specifies the data file type. The following data file types are included: 1) Monthly mean output on the full grid for the full set of variables; [DFT] = 2) Zonal mean monthly mean output for the full set of variables; [DFT] = _zm 3) Global mean monthly mean output for the full set of variables; [DFT] = _gm 4) Height-interpolated/-integrated output on the full grid for selected variables; [DFT] = _ht A cos(latitude) weighting was used when calculating the global means. Data were interpolated to a set of constant heights (61 levels in total) using the Z3GM variable (for variables output on midpoints, with 'lev' as the vertical coordinate) or the Z3GMI variable (for variables output on interfaces, with ilev as the vertical coordinate) stored on the original output files (type 1 above). Interpolation was done separately for each longitude, latitude and time. Mass density (DEN [g/cm3]) was calculated from the M_dens, N2_vmr, O2, and O variables on the original data files before interpolation to constant height levels. The Joule heating power QJ [W/m3] was calculated using Q_J = (sigma_P*B^2)*((u_i - U_n)^2 + (v_i-v_n)^2 + (w_i-w_n)^2) with sigma_P = Pedersen conductivity[S], B = geomagnetic field strength [T], ui, vi, and wi = zonal, meridional, and vertical ion velocities [m/s] and un, vn, and wn = neutral wind velocities [m/s]. QJ was integrated vertically in height (using a 2.5 km height grid spacing rather than the 61 levels on output file type 4) to give the JHH variable on the type 4 data files. The QJOULE variable also given is the Joule heating rate [K/s] at each of the 61 height levels. All data are provided as monthly mean files with one time record per file, giving 672 files for each data file type for the period 2015-2070 (56 years). References: Cnossen, I. (2022), A realistic projection of climate change in the upper atmosphere into the 21st century, in preparation. Liu, H.-L., C.G. Bardeen, B.T. Foster, et al. (2018), Development and validation of the Whole Atmosphere Community Climate Model with thermosphere and ionosphere extension (WACCM-X 2.0), Journal of Advances in Modeling Earth Systems, 10(2), 381-402, doi:10.1002/2017ms001232.

  8. Supplementary Simulation Data for "Computational and experimental assessment...

    • zenodo.org
    application/gzip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    César Ramírez-Sarmiento; César Ramírez-Sarmiento (2025). Supplementary Simulation Data for "Computational and experimental assessment of key interdomain residues controlling the fold-switch of RfaH" [Dataset]. http://doi.org/10.5281/zenodo.15265405
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    César Ramírez-Sarmiento; César Ramírez-Sarmiento
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional simulation data for "Computational and experimental assessment of key interdomain residues controlling the fold-switch of RfaH"

    Content:

    'AA-SBM': Contains a total of 4 folders, which require the use of SMOG2 and the GROMACS v4.5.4 version with added Gaussian contact potentials available at the SMOG server (https://smog-server.org).

    • 'pdb': Includes the SMOG2-ready PDB file of full-length RfaH generated using MODELLER, based on PDBs 2OUG and 5OND, and further minimized in explicit solvent using GROMACS v4.5.3 and the Amber ff99SB-ILDN force field.
    • 'smog': Contains the SMOG2-generated files for MD simulations using All-Atom Structure-Based Models (SBMs).
    • 'simulate': Includes simple scripts for running simulations at several temperatures on an HPC cluster
    • 'analysis': Contains simple scripts for concatenation of energies and trajectories, extraction of the potential energy of the system and the number of native contacts (Q) for each simulation run, and a script for running the weighted histogram analysis method based on the java WHAM.jar available with SMOG2.

    'ColabFold': Contains a total of 2 folders with results from protein structure predictions using ColabFold v1.5.5 (https://colabfold.com).

    • 'structures': Contains 8 .tar.gz files with the predicted structures of several E. coli RfaH variants (WT, E48A, F126A, I129A, E136A, R138A, S139A, L142A, L143A, I146A, N147A, V154) under different conditions. Each folder within the .tar.gz file contains a total of 600 predicted structures for all variants (50 predicted structures per variant), as well as the results of analyzing these structures based on RMSD using k-means clustering and based on TM-score using hierarchical clustering.
      • r3_s10_nodrop: 5 model parameters, 3 recycles, 10 seeds, no dropouts
      • r3_s10_nodrop_MSA: 5 model parameters, 3 recycles, 10 seeds, no dropouts, using the same MSA as RfaH WT for all RfaH variants
      • r3_s10_drop: 5 model parameters, 3 recycles, 10 seeds, with dropouts
      • r3_s10_drop_MSA: 5 model parameters, 3 recycles, 10 seeds, with dropouts, using the same MSA as RfaH WT for all RfaH variants
      • r12_s10_nodrop: 5 model parameters, 12 recycles, 10 seeds, no dropouts
      • r12_s10_nodrop_MSA: 5 model parameters, 12 recycles, 10 seeds, no dropouts, using the same MSA as RfaH WT for all RfaH variants
      • r12_s10_drop: 5 model parameters, 12 recycles, 10 seeds, with dropouts
      • r12_s10_drop_MSA: 5 model parameters, 12 recycles, 10 seeds, with dropouts, using the same MSA as RfaH WT for all RfaH variants
    • 'analysis': Contains 2 Jupyter Notebooks for usage in Google Colab to perform
      • k-means clustering analysis of the structures obtained by ColabFold based on the RMSD of residues 126-131
      • Hierarchical clustering analysis of the structures obtained by ColabFold based on the TM-score against the best predicted structure (rank 1) for each variant.
  9. Monthly mean climate data from a transient simulation with the Whole...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Oct 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ingrid Cnossen (2020). Monthly mean climate data from a transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 1950 to 2015 [Dataset]. https://catalogue.ceda.ac.uk/uuid/dc91f5e39ae34fd883af81dfdbaf659c
    Explore at:
    Dataset updated
    Oct 13, 2020
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Ingrid Cnossen
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Time period covered
    Jan 1, 1950 - Dec 31, 2015
    Area covered
    Earth
    Variables measured
    atmosphere_hybrid_sigma_pressure_coordinate
    Description

    This dataset comprises monthly mean data from a global, transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 1950 to 2015. WACCM-X is a global atmosphere model covering altitudes from the surface up to ~500 km, i.e. including the troposphere, stratosphere, mesosphere and thermosphere.

    WACCM-X version 2.0 (Liu et al., 2018) was used, part of the Community Earth System Model (CESM) release 2.1.0 made available by the US National Center for Atmospheric Research. The model was run in free-running mode with a horizontal resolution of 1.9° latitude 2.5° longitude (giving 96 latitude points and 144 longitude points) and 126 vertical levels. Further description of the model and simulation setup is provided by Cnossen (2020) and references therein. A large number of variables are included on standard monthly mean output files on the model grid, while selected variables are also offered interpolated to a constant height grid or vertically integrated in height (details below). Zonal mean and global mean output files are included as well.

    The following data file types are included: 1)Monthly mean output on the full grid for the full set of variables; [DFT] = '' 2)Zonal mean monthly mean output for the full set of variables; [DFT] = _zm 3)Global mean monthly mean output for the full set of variables; [DFT] = _gm 4)Height-interpolated/-integrated output on the full grid for selected variables; [DFT] = _ht

    A cos(latitude) weighting was used when calculating the global means.

    Data were interpolated to a set of constant heights (61 levels in total) using the Z3GM variable (for variables output on midpoints, with "lev" as the vertical coordinate) or the Z3GMI variable (for variables output on interfaces, with "ilev" as the vertical coordinate) stored on the original output files (type 1 above). Interpolation was done separately for each longitude, latitude and time.

    Mass density (DEN [g/cm3]) was calculated from the M_dens, N2_vmr, O2, and O variables on the original data files before interpolation to constant height levels.

    The Joule heating power QJ [W/m3] was calculated using Q_J=_P B^2 [(u_i-u_n )^2+(v_i-v_n )^2+(w_i-w_n )^2] with P = Pedersen conductivity [S], B = geomagnetic field strength [T], ui, vi, and wi = zonal, meridional, and vertical ion velocities [m/s] and un, vn, and wn = neutral wind velocities [m/s]. QJ was integrated vertically in height (using a 2.5 km height grid spacing rather than the 61 levels on output file type 4) to give the JHH variable on the type 4 data files. The QJOULE variable also given is the Joule heating rate [K/s] at each of the 61 height levels.

    All data are provided as monthly mean files with one time record per file, giving 792 files for each data file type for the period 1950-2015 (66 years).

  10. Data for HVRM_model

    • figshare.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuewen Jiang; Guorui Sun (2025). Data for HVRM_model [Dataset]. http://doi.org/10.6084/m9.figshare.28386995.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yuewen Jiang; Guorui Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The scene modeling and simulation data is included in Data.zip.

  11. F

    Data from: A generic gust definition and detection method based on...

    • data.uni-hannover.de
    • search.datacite.org
    zip
    Updated Jan 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AG PALM (2022). A generic gust definition and detection method based on wavelet-analysis [Dataset]. https://data.uni-hannover.de/dataset/a-generic-gust-definition-and-detection-method-based-on-wavelet-analysis
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 20, 2022
    Dataset authored and provided by
    AG PALM
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This dataset is associated with the paper Knoop et al. (2019) titled "A generic gust definition and detection method based on wavelet-analysis" published in "Advances in Science and Research (ASR)" within the Special Issue: 18th EMS Annual Meeting: European Conference for Applied Meteorology and Climatology 2018. It contains the data and analysis software required to recreate all figures in the publication.

  12. E

    Ocean surface drifter and drifter simulation data

    • dtechtive.com
    • find.data.gov.scot
    txt
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    School of Mathematics. Maxwell Institute for Mathematical Sciences (2023). Ocean surface drifter and drifter simulation data [Dataset]. http://doi.org/10.7488/ds/3821
    Explore at:
    txt(0.0008 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    School of Mathematics. Maxwell Institute for Mathematical Sciences
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Many practical problems in fluid dynamics demand an empirical approach, where statistics estimated from data inform understanding and modelling. In this context data-driven probabilistic modelling offers an elegant alternative to ad hoc estimation procedures. Probabilistic models are useful as emulators, but also offer an attractive means of estimating particular statistics of interest. In this paradigm one can rely on probabilistic scoring rules for model comparison and validation. Stochastic neural networks provide a particularly rich class of probabilistic models, which, when paired with modern optimisation algorithms and GPUs, can be remarkably efficient. We demonstrate this approach by learning the single particle transition density of ocean surface drifters from observations using a mixture density network. This provides a comprehensive description of drifter dynamics, from which we derive maps of various single-particle statistics. Our model also offers a means of simulating drifter trajectories as a discrete-time Markov process. A drifter release simulation using our model shows the emergence of concentrated clusters in the subtropical gyres, in agreement with previous studies on the formation of garbage patches. The dataset is intended to accompany the code repository archived at doi.org/10.5281/zenodo.7737161 . They are both related to the upcoming paper Brolly, M.T. (in submission), 'Inferring ocean transport statistics with probabilistic neural networks'.

  13. f

    Data from: Gradient Boosted Machine Learning Model to Predict H2, CH4, and...

    • figshare.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Bailey; Adam Jackson; Razvan-Antonio Berbece; Kejun Wu; Nicole Hondow; Elaine Martin (2023). Gradient Boosted Machine Learning Model to Predict H2, CH4, and CO2 Uptake in Metal–Organic Frameworks Using Experimental Data [Dataset]. http://doi.org/10.1021/acs.jcim.3c00135.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tom Bailey; Adam Jackson; Razvan-Antonio Berbece; Kejun Wu; Nicole Hondow; Elaine Martin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Predictive screening of metal–organic framework (MOF) materials for their gas uptake properties has been previously limited by using data from a range of simulated sources, meaning the final predictions are dependent on the performance of these original models. In this work, experimental gas uptake data has been used to create a Gradient Boosted Tree model for the prediction of H2, CH4, and CO2 uptake over a range of temperatures and pressures in MOF materials. The descriptors used in this database were obtained from the literature, with no computational modeling needed. This model was repeated 10 times, showing an average R2 of 0.86 and a mean absolute error (MAE) of ±2.88 wt % across the runs. This model will provide gas uptake predictions for a range of gases, temperatures, and pressures as a one-stop solution, with the data provided being based on previous experimental observations in the literature, rather than simulations, which may differ from their real-world results. The objective of this work is to create a machine learning model for the inference of gas uptake in MOFs. The basis of model development is experimental as opposed to simulated data to realize its applications by practitioners. The real-world nature of this research materializes in a focus on the application of algorithms as opposed to the detailed assessment of the algorithms.

  14. S

    Supplementary materials including additional simulation data and R code

    • scidb.cn
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongmei LIN; Yuanyuan TANG; Xiaorui WANG; Jianming ZHU; Yanlin TANG; Tiejun TONG (2024). Supplementary materials including additional simulation data and R code [Dataset]. http://doi.org/10.57760/sciencedb.j00207.00014
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Hongmei LIN; Yuanyuan TANG; Xiaorui WANG; Jianming ZHU; Yanlin TANG; Tiejun TONG
    License

    https://api.github.com/licenses/cc0-1.0https://api.github.com/licenses/cc0-1.0

    Description

    The Supplementary materials present simulation data including biases and mean squared errors of estimators, the type I error rate and power curves of rank score test, and the estimated mean lengths and the empirical coverage probabilities of confidence intervals in various cases. The R codes are included.

  15. Entity name, meaning and corresponding production unit.

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanwen Liu; Xiaobing Liu; Lin Lin; Sardar M. N. Islam; Yuqing Xu (2023). Entity name, meaning and corresponding production unit. [Dataset]. http://doi.org/10.1371/journal.pone.0239685.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hanwen Liu; Xiaobing Liu; Lin Lin; Sardar M. N. Islam; Yuqing Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Entity name, meaning and corresponding production unit.

  16. f

    Mean results of the simulation.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 4, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huang, Hai-Hui; Liang, Yong; Liu, Xiao-Ying (2016). Mean results of the simulation. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001534316
    Explore at:
    Dataset updated
    May 4, 2016
    Authors
    Huang, Hai-Hui; Liang, Yong; Liu, Xiao-Ying
    Description

    In bold–the best performance amongst all the methods.

  17. f

    Mean (sd) of the empirical link probabilities for the simulation data.

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanxuan Yang; Wei Xiong; Xueliang Zhang; Kai Wang; Maozai Tian (2023). Mean (sd) of the empirical link probabilities for the simulation data. [Dataset]. http://doi.org/10.1371/journal.pone.0253873.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hanxuan Yang; Wei Xiong; Xueliang Zhang; Kai Wang; Maozai Tian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean (sd) of the empirical link probabilities for the simulation data.

  18. d

    Data from: Fluvial Egg Drift Simulator (FluEgg) Results for 240 Simulations...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Fluvial Egg Drift Simulator (FluEgg) Results for 240 Simulations of Bighead Carp Egg and Larval Drift in the Illinois River [Dataset]. https://catalog.data.gov/dataset/fluvial-egg-drift-simulator-fluegg-results-for-240-simulations-of-bighead-carp-egg-and-lar
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Illinois River, Illinois
    Description

    The Fluvial Egg Drift Simulator (FluEgg) estimates bighead, silver, and grass carp egg and larval drift in rivers using species-specific egg developmental data combined with user-supplied hydraulic inputs (Garcia and others, 2013, Domanski, 2020). This data release contains results from 240 FluEgg 4.1.0 simulations of bighead carp eggs in the Illinois River under steady flow conditions. The data release also contains the hydraulic inputs used in the FluEgg simulations and a KML file of the centerline that represents the model domain. FluEgg simulations were run for all combinations of four spawning locations, six water temperatures, and ten steady flow conditions. Each simulation included 5,000 bighead carp eggs, which develop and eventually hatch into larvae. The simulations end when the larvae reach the gas bladder inflation stage. The four spawning locations were just downstream of the lock and dam structures at Marseilles, Starved Rock, Peoria, and LaGrange. For each of these spawning locations, the eggs were assumed to have been spawned at the water surface and at the midpoint of the channel. The six water temperatures were 18, 20, 22, 24, 26, and 28 degrees Celsius. The ten steady flow conditions ranged from half the annual mean flow to the 500-year peak flow and are discussed in more detail below. Note that in the streamwise coordinate system used by FluEgg, the streamwise coordinate of the Mississippi River confluence is 396,639 meters. Any drift distances greater than this value should be excluded from any further analysis of this data. The hydraulic inputs for the FluEgg simulations were generated using a one-dimensional steady Hydrologic Engineering Center-River Analysis System (HEC-RAS) 5.0.7 model for the Illinois River between Marseilles Lock and Dam and the Mississippi River confluence near Grafton, Illinois (HEC-RAS, 2019). The HEC-RAS model was developed by combining four individual HEC-RAS models obtained from the U.S. Army Corps of Engineers Rock Island District (U.S. Army Corps of Engineers Rock Island District, 2003). The model was run for the following ten flow profiles: half the annual mean flow, annual mean flow, annual mean flood, 2-year peak flow, 5-year peak flow, 10-year peak flow, 25-year peak flow, 50-year peak flow, 100-year peak flow, and 500-year peak flow. The flow rates for each of the profiles were obtained for the following U.S. Geological survey (USGS) streamgaging stations from USGS StreamStats: 5543500 Illinois River at Marseilles, Illinois, 5558300 Illinois River at Henry, Illinois, 5560000 Illinois River at Peoria, Illinois, 5568500 Illinois River at Kingston Mines, Illinois, 5570500 Illinois River near Havana, Illinois, 5585500 Illinois River at Meredosia, Illinois, 5586100 Illinois River at Valley City, Illinois (Soong and others, 2004; Granato and others, 2017). Garcia, T., Jackson, P.R., Murphy, E.A., Valocchi, A.J., Garcia, M.H., 2013. Development of a Fluvial Egg Drift Simulator to evaluate the transport and dispersion of Asian carp eggs in rivers. Ecol. Model. 263, 211–222, https://doi.org/10.1016/j.ecolmodel.2013.05.005. Granato G.E., Ries, K.G., III, and Steeves, P.A., 2017, Compilation of streamflow statistics calculated from daily mean streamflow data collected during water years 1901–2015 for selected U.S. Geological Survey streamgages: U.S. Geological Survey Open-File Report 2017–1108, 17 p., https://doi.org/10.3133/ofr20171108. Domanski, M.M., Berutti, M.C., 2020, FluEgg, U.S. Geological Survey software release, https://doi.org/10.5066/P93UCQR2. Hydrologic Engineering Center-River Analysis System (HEC-RAS), 2019, accessed August 20, 2020, at http://www.hec.usace.army.mil/software/hec-ras/. Soong, D.T., Ishii, A.L., Sharpe, J.B., and Avery, C.F., 2004, Estimating flood-peak discharge magnitudes and frequencies for rural streams in Illinois: U.S. Geological Survey Scientific Investigations Report 2004–5103, 147 p., https://doi.org/10.3133/sir20045103. U.S. Army Corps of Engineers Rock Island District, 2004, Upper Mississippi River System Flow Frequency Study, Hydrology and Hydraulics, Appendix C, Illinois River, accessed August 20, 2020, at https://www.mvr.usace.army.mil/Portals/48/docs/FRM/UpperMissFlowFreq/App.%20C%20Rock%20Island%20Dist.%20Illinois%20River%20Hydrology_Hydraulics.pdf.

  19. R

    Data and statistics of a direct numerical simulation of adverse pressure...

    • entrepot.recherche.data.gouv.fr
    avi, mkv, nc, pdf +2
    Updated Oct 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAVAL Jean-Philippe; LAVAL Jean-Philippe (2024). Data and statistics of a direct numerical simulation of adverse pressure gradient turbulent boundary layer [Dataset]. http://doi.org/10.57745/FMZ9HP
    Explore at:
    nc(33592456192), nc(183750471), text/x-python(5402), zip(585921), nc(29140), avi(88888284), pdf(3015083), nc(41948), mkv(131819512), nc(268780), nc(23532054496), nc(23532054954), nc(17649045648)Available download formats
    Dataset updated
    Oct 17, 2024
    Dataset provided by
    Recherche Data Gouv
    Authors
    LAVAL Jean-Philippe; LAVAL Jean-Philippe
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/FMZ9HPhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/FMZ9HP

    Dataset funded by
    GENCI
    Description

    Although it is a widespread phenomenon in nature, turbulence in fluids (gases, liquids) is still very poorly understood. One area of research involves analyzing data from academic flow simulations. To make progress, the scientific community needs a large amount of reliable data in various configurations. Turbulent flows near solid flat or curved walls are very interesting examples. The database is composed of the 3D raw data (velocity, pressure, time derivative of velocity) and statistics (mean, Reynolds stresses, length scales) of a direct numerical simulations of moderate adverse pressure gradient (decelerating) turbulent boundary layer on a flat plate at Reynolds number up to Reθ = 8000 .

  20. d

    Input Data for Model Archive: Two-dimensional flow simulations of the...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Input Data for Model Archive: Two-dimensional flow simulations of the Sacramento River near Glenn, California [Dataset]. https://catalog.data.gov/dataset/input-data-for-model-archive-two-dimensional-flow-simulations-of-the-sacramento-river-near
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Sacramento River, California, Glenn
    Description

    This model archive contains the data and software application necessary to simulate two-dimensional hydraulic parameters along a 1.6 kilometer study reach of the Sacramento River near Glenn, California. The iRIC modeling system and the NAYS2DH solver were used to simulate three river flows (90, 191, and 255 cubic meters per second) and provide spatially distributed depths, velocities, and water-surface elevations along the study reach. The archive is split into child-items to help distinguish the individual components of the archive and make downloading of large files more manageable. The first child item in the archive is the hydraulic model software application. The second child item includes the topographic data used to construct the model grid as well as field measurements of water-surface elevation and depth-averaged velocity used to calibrate the hydraulic roughness parameter. The third child item provides output from the NAYS2DH model using various Manning's n roughness values. A comparison of the root mean square errors between the model simulation and field measurements is included for each roughness parameter. The fourth child item includes model output for the three river flows that were simulated to support the manuscript.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Organization logo

Simulation Data Set

Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Search
Clear search
Close search
Google apps
Main menu