These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The mean number of components used for simulated data for and = 0.1.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The performance of statistical methods is frequently evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, available data- generating models are restricted to either inclusion of two-armed trials or the fixed-effect model. Based on data-generation in the pairwise case, we propose a framework for the simulation of random-effect network meta-analyses including multi-arm trials with binary outcome. The only of the common data-generating models which is directly applicable to a random-effects network setting uses strongly restrictive assumptions. To overcome these limitations, we modify this approach and derive a related simulation procedure using odds ratios as effect measure. The performance of this procedure is evaluated with synthetic data and in an empirical example.
The Internet of Things (IoT) is comprised of networks of physical, computational, and human components that coordinate to fulfill time-sensitive functions in a shared operating environment. Development and testing of IoT systems often utilizes modeling and simulation, whether to analyze potential performance gains of new technologies or develop robust digital twins to support future operations and maintenance. However, the complexity and scale of IoT means that individual simulators are often inadequate to simulate the real-world dynamics of such systems, and simulators must be combined with other software or hardware.The National Institute of Standards and Technology (NIST) has developed a software module that extends the ns-3 network simulator with a new capability to communicate with external software and hardware at runtime. This software facilitates the development of co-simulations where ns-3 models can synchronize and exchange data with external processes to develop higher-fidelity simulations. The software is open-source and available on the NIST GitHub.
1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are results after normalization using mean centering as described in Reisetter et al.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains simulated data that is meant to represent sensitive finance data. # porftolio_id = unique identifer for portfolio (discrete) # date = unique date/time (discrete) # ticker = company stock ticker (discrete) # price = stock price (USD) (conintious, increasing mean, sd equals 5) # shares = number of shares held (count) # revenue = revenue in billion (USD) (continuous) # operating_income = operating income in billion (USD) (continuous) # profit = profit in billion (USD) (continuous) # total_assets = total assets in billion (USD) (continuous) # total_equity = total equity in billion (USD) (continuous) # industry = 'Basic Materials', 'Consumer Goods', 'Consumer Services', 'Financials', 'Health Care', 'Industrials', 'Oil and Gas', 'Technology', 'Telecom', 'Utilities' # country = Correlates of War Code (discrete) # intl = International or Domestic company (dichotomous) # ceo_salary = Salary of CEO in million (USD) (continuous) # no_employees = employees = 'lt 500', '500 - 1,000', '1,000 - 10,000', '10,000plus' # founded = year founded (discrete)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Predictive screening of metal–organic framework (MOF) materials for their gas uptake properties has been previously limited by using data from a range of simulated sources, meaning the final predictions are dependent on the performance of these original models. In this work, experimental gas uptake data has been used to create a Gradient Boosted Tree model for the prediction of H2, CH4, and CO2 uptake over a range of temperatures and pressures in MOF materials. The descriptors used in this database were obtained from the literature, with no computational modeling needed. This model was repeated 10 times, showing an average R2 of 0.86 and a mean absolute error (MAE) of ±2.88 wt % across the runs. This model will provide gas uptake predictions for a range of gases, temperatures, and pressures as a one-stop solution, with the data provided being based on previous experimental observations in the literature, rather than simulations, which may differ from their real-world results. The objective of this work is to create a machine learning model for the inference of gas uptake in MOFs. The basis of model development is experimental as opposed to simulated data to realize its applications by practitioners. The real-world nature of this research materializes in a focus on the application of algorithms as opposed to the detailed assessment of the algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data was generated based on simulations of the patient-ventilator interaction in research by A. van Diepen et al. [1]. The data contains the airway pressure, flow, and volume waveforms including the labeling of patient and ventilator timings resulting from the simulations. In total, the data contains 1405 simulation runs. Subsequently, the simulated data was used in the development of patient-ventilator asynchrony detection and the evaluation of inspiratory effort estimation, in research by T.H.G.F. Bakkes et al. and A. van Diepen et al., respectively [2, 3].
More details on the contents of the files can be found in the 'Read me' file.
Changes 19-07-2024:
Added muscle pressure to 'waveforms.zip'
Added 'pmus' description to 'Read me.txt'
[1] A. van Diepen et al., A model-based approach to generating annotated pressure support waveforms, DOI: https://doi.org/10.1007/s10877-022-00822-4
[2] T.H.G.F. Bakkes et al., Automated detection and classification of patient-ventilator asynchrony by means of machine learning and simulated data, DOI: https://doi.org/10.1016/j.cmpb.2022.107333
[3] A. van Diepen et al., Evaluation of the accuracy of established patient inspiratory effort estimation methods during mechanical support ventilation, DOI: https://doi.org/10.1016/j.heliyon.2023.e13610
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise. The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data. The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to: 1) unit number 2) time, in cycles 3) operational setting 1 4) operational setting 2 5) operational setting 3 6) sensor measurement 1 7) sensor measurement 2 ... 26) sensor measurement 26 Data Set: FD001 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: ONE (HPC Degradation) Data Set: FD002 Train trjectories: 260 Test trajectories: 259 Conditions: SIX Fault Modes: ONE (HPC Degradation) Data Set: FD003 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: TWO (HPC Degradation, Fan Degradation) Data Set: FD004 Train trjectories: 248 Test trajectories: 249 Conditions: SIX Fault Modes: TWO (HPC Degradation, Fan Degradation) Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The python script has an option to either generate the data or just plot it. Becuase the software is still closed source, it should only be used for plotting. Therefore, the data are also included to generate the plot and the script has run_simulations set to false.
This is a scan of mean time to failure, mean time to repair, and requested job size (# of GPUs) for a GB200 system. For these, the values are:
allocation_sizes = [72, 68, 64, 60, 56, 52, 48, 44, 40] # gpus
mttf = [100, 200, 400, 800, 1600, 3200] # days
mttr = [1, 2, 4, 8, 14] # days
resulting in 270 simulations.
Requirements to run the plotting script:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are related to the Data Article entitled: “A benchmark dataset for the retail multiskilled personnel planning under uncertain demand”, submitted to the Data Science Journal. This data article describes datasets from a home improvement retailer located in Santiago, Chile. The datasets were developed to solve a multiskilled personnel assignment problem (MPAP) under uncertain demand. Notably, these datasets were used in the published article "Multiskilled personnel assignment problem under uncertain demand: A benchmarking analysis" authored by Henao et al. (2022). Moreover, the datasets were also used in the published articles authored by Henao et al. (2016) and Henao et al. (2019) to solve MPAPs.
The datasets include real and simulated data. Regarding the real dataset, it includes information about the store size, number of employees, employment-contract characteristics, mean value of weekly hours demand in each department, and cost parameters. Regarding the simulated datasets, they include information about the random parameter of weekly hours demand in each store department. The simulated data are presented in 18 text files classified by: (i) Sample type (in-sample or out-of-sample). (ii) Truncation-type method (zero-truncated or percentile-truncated). (iii) Coefficient of variation (5, 10, 20, 30, 40, 50%).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Description
This process describes the management of customer orders within a company, comprising both the registration and payment of incoming orders, as well as the process of packing and shipping these orders. For these tasks, our company deploys staff in their sales, warehousing, and shipment departments.
This is an artificial event log according to the OCEL 2.0 Standard simulated using CPN-Tools. Both the CPN and the SQLite can be downloaded. The simulation is an extension of the order management log in the former OCEL standard.
Process Overview
At our company, customers place orders (place order) for different products in varying amounts. Each product type has a price and a weight. In the current market situation, there is an inflation that irregularly leads to an increase of prices. These price rises have a negative impact on customers’ purchasing power, i.e., on order volumes.
When a customer places an order, this order is assigned to an employee of our company’s sales department. To foster customer satisfaction, our company has a single-face-to-customer policy. This means that per customer there is one primary sales representative who ought to render all services related to that customer. If that first representative is unavailable, a second sales representative should take care of the order. Should this employee be also unavailable, the order has to be managed by another employee. The tasks of sales employees comprise the registration (confirm order) as well as payment processing (payment reminder, pay order).
In parallel to this, the shipment of goods is prepared. For this, the stock of our company is checked by an employee of the warehousing department for the availability of the ordered items. If necessary, the warehouser reorders the item (item out of stock, reorder item). Items ready for shipment are collected (pick item) for the placement into packages that are addressed to single customers. Here, it may happen that a package content relates to multiple orders, and order volumes are distributed over multiple packages.
After all items allocated to a package have been picked, the package is compiled by a warehousing employee (create package). Later on, this package is picked up by a shipment employee for transport (send package). According to another policy, a warehousing employee should provide assistance to the shipment employee in loading the package. However, oftentimes shippers act contrary to that policy and load packages alone or together with a second shipment employee.
Finally, the package is shipped. Deliveries may fail repeatedly (failed delivery) until successful delivery (package delivered).
The figure below depicts the process in a simplified manner, using an informal process notation to describe the control-flow and the involved object types. A formal description is given along with the artifacts in the next section.
Further information can be found at: https://www.ocel-standard.org/event-logs/simulations/order-management/
General Properties
An overview of log properties is given below.
Property
Value
Event Types
11
Object Types
6
Events
21008
Objects
10840
Control-Flow Behavior
The behavior of the log is described by a respective object-centric Petri net. Also, individual object types exhibit behavior that can be described by simpler Petri nets. See below.
orders
customers
items
employees
packages
products
Full object-centric Petri net
Object Relationships
The company pursues the "one-face-to-the-customer" policy, in which every customer has a dedicated sales representative as well as a deputy (secondary representative). These relationships are described in the log.
Source Object Type
Target Object Type
Qualifier
employees
customers
primarySalesRep
employees
customers
secondarySalesRep
Additionally, object-to-object relations can emerge at executions of specific activities:
Activity
Source Object Type
Target Object Type
Qualifier
create package
package
employee
packed by
send package
package
employee
forwarded by
send package
package
employee
shipped by
Simulation Model
The CPN used to create this event log can also be downloaded.To obtain simulated data, extract the linked ZIP file and play out the CPN therein, e.g., by using CPN Tools.
The play-out produces CSV files according to the schema of OCEL2.0. The provided jupyter notebook can be used to convert these files to an SQLite dump.
For a technical documentation of the simulation model, please open the attached CPN with CPN Tools and see the annotations therein.
Acknowledgements
Funded under the Excellence Strategy of the Federal Government and the Länder. We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes outputs simulated in NICAM, targeting extremely intense tropical cyclones (TCs): BOLAVEN (2023), MAWAR (2023), NANMADOL (2022), RAI (2021), and SURIGAE (2021). Files in "TCcenter" include center location, center sea level pressure, and its 12 hourly mean value at each time step for observation data and each simulation data. Directories in "NEW" and "OLD" include output variables in each simulation. These outputs are azimuthally averaged around the TC center.
This dataset provides simulated sea surface height (SSH) in a format similar to the future SWOT Level 2 (L2) SSH data from KaRIn. The simulated data were generated by the "ECCO LLC4320" global ocean simulation. ECCO, which means "Estimating the Circulation and Climate of the Ocean", is a data assimilation and model (and the international consortium of scientists who maintains it) based on the MIT general circulation model (MITgcm) that assimilates and constrains observational data from numerous sources to estimate the ocean state. The model operates on the Lat-Lon-Cap (LLC) grid with a nominal horizontal resolution of 1/48-degrees (when approximated over the entire model _domain, corresponding to ~2-km cell size at the equator). SSH data produced by ECCO LLC4320 were rendered from the native output format into the format prescribed in the SWOT L2 SSH PDD to aid ongoing data product development and to benefit future users of data produced during operational phases of the SWOT mission.
Introduction Preservation and management of semi-arid ecosystems requires understanding of the processes involved in soil erosion and their interaction with plant community. Rainfall simulations on natural plots provide an effective way of obtaining a large amount of erosion data under controlled conditions in a short period of time. This dataset contains hydrological (rainfall, runoff, flow velocity), erosion (sediment concentration and rate), vegetation (plant cover), and other supplementary information from 272 rainfall simulation experiments conducted on 23 rangeland locations in Arizona and Nevada between 2002 and 2013. The dataset advances our understanding of basic hydrological and biological processes that drive soil erosion on arid rangelands. It can be used to quantify runoff, infiltration, and erosion rates on a variety of ecological sites in the Southwestern USA. Inclusion of wildfire and brush treatment locations combined with long term observations makes it important for studying vegetation recovery, ecological transitions, and effect of management. It is also a valuable resource for erosion model parameterization and validation. Instrumentation Rainfall was generated by a portable, computer-controlled, variable intensity simulator (Walnut Gulch Rainfall Simulator). The WGRS can deliver rainfall rates ranging between 13 and 178 mm/h with variability coefficient of 11% across 2 by 6.1 m area. Estimated kinetic energy of simulated rainfall was 204 kJ/ha/mm and drop size ranged from 0.288 to 7.2 mm. Detailed description and design of the simulator is available in Stone and Paige (2003). Prior to each field season the simulator was calibrated over a range of intensities using a set of 56 rain gages. During the experiments windbreaks were setup around the simulator to minimize the effect of wind on rain distribution. On some of the plots, in addition to rainfall only treatment, run-on flow was applied at the top edge of the plot. The purpose of run-on water application was to simulate hydrological processes that occur on longer slopes (>6 m) where upper portion of the slope contributes runoff onto the lower portion. Runoff rate from the plot was measured using a calibrated V-shaped supercritical flume equipped with depth gage. Overland flow velocity on the plots was measured using electrolyte and fluorescent dye solution. Dye moving from the application point at 3.2 m distance to the outlet was timed with stopwatch. Electrolyte transport in the flow was measured by resistivity sensors imbedded in edge of the outlet flume. Maximum flow velocity was defined as velocity of the leading edge of the solution and was determined from beginning of the electrolyte breakthrough curve and verified by visual observation (dye). Mean flow velocity was calculated using mean travel time obtained from the electrolyte solution breakthrough curve using moment equation. Soil loss from the plots was determined from runoff samples collected during each run. Sampling interval was variable and aimed to represent rising and falling limbs of the hydrograph, any changes in runoff rate, and steady state conditions. This resulted in approximately 30 to 50 samples per simulation. Shortly before every simulation plot surface and vegetative cover was measured at 400 point grid using a laser and line-point intercept procedure (Herrick et al., 2005). Vegetative cover was classified as forbs, grass, and shrub. Surface cover was characterized as rock, litter, plant basal area, and bare soil. These 4 metrics were further classified as protected (located under plant canopy) and unprotected (not covered by the canopy). In addition, plant canopy and basal area gaps were measured on the plots over three lengthwise and six crosswise transects. Experimental procedure Four to eight 6.1 m by 2 m replicated rainfall simulation plots were established on each site. The plots were bound by sheet metal borders hammered into the ground on three sides. On the down slope side a collection trough was installed to channel runoff into the measuring flume. If a site was revisited, repeat simulations were always conducted on the same long term plots. The experimental procedure was as follows. First, the plot was subjected to 45 min, 65 mm/h intensity simulated rainfall (dry run) intended to create initial saturated condition that could be replicated across all sites. This was followed by a 45 minute pause and a second simulation with varying intensity (wet run). During wet runs two modes of water application were used as: rainfall or run-on. Rainfall wet runs typically consisted of series of application rates (65, 100, 125, 150, and 180 mm/h) that were increased after runoff had reached steady state for at least five minutes. Runoff samples were collected on the rising and falling limb of the hydrograph and during each steady state (a minimum of 3 samples). Overland flow velocities were measured during each steady state as previously described. When used, run-on wet runs followed the same procedure as rainfall runs, except water application rates varied between 100 and 300 mm/h. In approximately 20% of simulation experiments the wet run was followed by another simulation (wet2 run) after a 45 min pause. Wet2 runs were similar to wet runs and also consisted of series of varying intensity rainfalls and/or run-on inputs. Resulting Data The dataset contains hydrological, erosion, vegetation, and ecological data from 272 rainfall simulation experiments conducted on 12 sq. m plots at 23 rangeland locations in Arizona and Nevada. The experiments were conducted between 2002 and 2013, with some locations being revisited multiple times. Resources in this dataset:Resource Title: Appendix B. Lists of sites and general information. File Name: Rainfall Simulation Sites Summary.xlsxResource Description: The table contains list or rainfall simulation sites and individual plots, their coordinates, topographic, soil, ecological and vegetation characteristics, and dates of simulation experiments. The sites grouped by common geographic area.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Appendix F. Site pictures. File Name: Site photos.zipResource Description: Pictures of rainfall simulation sites and plots.Resource Title: Appendix C. Rainfall simulations. File Name: Rainfall simulation.csvResource Description: Please see Appendix C. Rainfall simulations (revised) for data with errors corrected (11/27/2017). The table contains rainfall, runoff, sediment, and flow velocity data from rainfall simulation experimentsResource Software Recommended: MS Access,url: https://products.office.com/en-us/access Resource Title: Appendix C. Rainfall simulations. File Name: Rainfall simulation.csvResource Description: Please see Appendix C. Rainfall simulations (revised) for data with errors corrected (11/27/2017). The table contains rainfall, runoff, sediment, and flow velocity data from rainfall simulation experimentsResource Software Recommended: MS Excel,url: https://products.office.com/en-us/excel Resource Title: Appendix E. Simulation sites map. File Name: Rainfall Simulator Sites Map.zipResource Description: Map of rainfall simulation sites with embedded images in Google Earth.Resource Software Recommended: Google Earth,url: https://www.google.com/earth/ Resource Title: Appendix D. Ground and vegetation cover. File Name: Plot Ground and Vegetation Cover.csvResource Description: The table contains ground (rock, litter, basal, bare soil) cover, foliar cover, and basal gap on plots immediately prior to simulation experiments. Resource Software Recommended: Microsoft Access,url: https://products.office.com/en-us/access Resource Title: Appendix D. Ground and vegetation cover. File Name: Plot Ground and Vegetation Cover.csvResource Description: The table contains ground (rock, litter, basal, bare soil) cover, foliar cover, and basal gap on plots immediately prior to simulation experiments. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Appendix A. Data dictionary. File Name: Data dictionary.csvResource Description: Explanation of terms and unitsResource Software Recommended: MS Excel,url: https://products.office.com/en-us/excel Resource Title: Appendix A. Data dictionary. File Name: Data dictionary.csvResource Description: Explanation of terms and unitsResource Software Recommended: MS Access,url: https://products.office.com/en-us/access Resource Title: Appendix C. Rainfall simulations (revised). File Name: Rainfall simulation (R11272017).csvResource Description: The table contains rainfall, runoff, sediment, and flow velocity data from rainfall simulation experiments (updated 11/27/2017)Resource Software Recommended: Microsoft Access,url: https://products.office.com/en-us/access
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation.
The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data.
The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.
The data file with temperatures (smhi-july-23-29-2018.csv) acts as input for the thermodynamic building simulation found on Github, where it is used to get the outside temperature and corresponding timestamps. Temperature data for Luleå Summer 2018 were downloaded from SMHI.
This dataset provides simulated sea surface height (SSH) in a format similar to the future SWOT Level 2 (L2) altimetry data stream from the Poseidon 3C nadir altimeter. The simulated data were generated by the "ECCO LLC4320" global ocean simulation. ECCO, which means "Estimating the Circulation and Climate of the Ocean", is a data assimilation and model (and the international consortium of scientists who maintains it) based on the MIT general circulation model (MITgcm) that assimilates and constrains observational data from numerous sources to estimate the ocean state. The model operates on the Lat-Lon-Cap (LLC) grid with a nominal horizontal resolution of 1/48-degrees (when approximated over the entire model domain, corresponding to ~2-km cell size at the equator). SSH data produced by ECCO LLC4320 were rendered from the native output format into the format prescribed in the SWOT L2 SSH PDD to aid ongoing data product development and to benefit future users of data produced during operational phases of the SWOT mission.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data are for the article "Lorenz Energy Cycle: Another Way to Understand the Atmospheric Circulation on Tidally Locked Terrestrial Planets". The data are simulated by the general circulation model ExoCAM. The horizontal resolution is 4°x5°.
Note: ST: in standard coordinates; TL: in the tidally locked coordinates.
This repository contains data to reproduce analysis presented in the paper: B. Mourguiart, B. Liquet, K. Mengersen, T. Couturier, J. Mansons, Y. Braud, A. Besnard . A new method to explicitly estimate the shift of optimum along gradients in multispecies studies. Journal of Biogeography, (in press). The paper introduces a new formulation of a Bayesian hierarchical linear model that explicitly estimates optimum shifts for multiple species having symmetrical response curves. This new formulation, called Explicit Hierarchical Model of Optimum Shifts (EHMOS), is compared to a mean comparison method and a Bayesian generalized linear mixed model (GLMM) using simulated and real datasets. Fitting the models to the simulated data took several days. Here we provide the models' outputs needed to reproduce the results presented in the paper without re-running the models.Â
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This global dataset contains monthly, seasonal and annual statistics mean and maximum of significant wave height Hs that were produced using a statistical modelling approach and the 6-hourly sea level pressure SLP fields simulated by 20 global climate models participated in the Coupled Climate Model Comparison Phase 5 CMIP5 project for the historical period 1950-2005 and for two different future emission scenarios RCP4.5 and RCP8.5 . Reference Wang et al. 2014 - Changes in global ocean wave heights as projected using multimodel CMIP5 simulations. Geophysical Research Letters, 41, 1026-1034, doi:10.1002/2013GL058650.
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).