100+ datasets found

Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Simulation data and code
figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Feb 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charlotte de Vries; E Yagmur Erten (2022). Simulation data and code [Dataset]. http://doi.org/10.6084/m9.figshare.19232535.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19232535.v1
Dataset updated
Feb 24, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Charlotte de Vries; E Yagmur Erten
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
PF_simulation_data.zipcontains Simulation data to create figure 2 of de Vries, Erten and Kokko- Code_PF.zip contains C++ code to create the data used to create figure 2 (see PF_simulation_data.zip for the datafiles produced), and it also contains the R script to create figure 2 from the data (Figure2_cloud_25.R). All code files were created by Pen, I., & Flatt, T. (2021). Asymmetry, division of labour and the evolution of ageing in multicellular organisms. Philosophical Transactions of the Royal Society B, 376(1823), 20190729. C++ code is slightly adjusted to change output. Note that the R script takes a long time to run (multiple days on our laptops), and uses a lot of swap memory, we advice running it on a server. Alternatively, you can edit the code to use less than the last 25 days bychanging this line: ddead% filter(t>4975)to for example ddead% filter(t>4998)to use the last 2 time steps only. However, note that therewill be insufficient data at high ages to estimate mortality rates.
Call Centre Queue Simulation
kaggle.com
zip
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donovan Bangs (2022). Call Centre Queue Simulation [Dataset]. https://www.kaggle.com/datasets/donovanbangs/call-centre-queue-simulation
Explore at:
zip(841475 bytes)Available download formats
Dataset updated
Sep 20, 2022
Authors
Donovan Bangs
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Call Centre Queue Simulation

A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.

This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.

The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.

The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.

Discrete-Event Simulation

Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.

Arrival Rate

Service Rate

Number of Agents

The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.

Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.

For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.
h
data-drift-simulation-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Han Díaz, data-drift-simulation-dataset [Dataset]. https://huggingface.co/datasets/sdiazlor/data-drift-simulation-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Sara Han Díaz
Description
sdiazlor/data-drift-simulation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Simulation data.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hofinger, Gesine; Köster, Gerta; Rahn, Simon; Gödel, Marion (2022). Simulation data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000239732
Explore at:
Dataset updated
Aug 30, 2022
Authors
Hofinger, Gesine; Köster, Gerta; Rahn, Simon; Gödel, Marion
Description
The data sets contain the parameters and configurations for the simulation defined in the scenario file as well as the simulation outputs for all numerical experiments presented in this contribution. In addition, the scripts for the evaluation of the results are provided. (ZIP)
n
Data and code for: Generation and applications of simulated datasets to...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Mar 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Silk; Olivier Gimenez (2023). Data and code for: Generation and applications of simulated datasets to integrate social network and demographic analyses [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp7s
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.m0cfxpp7s
Dataset updated
Mar 10, 2023
Dataset provided by
Centre d'Écologie Fonctionnelle et Évolutive
Authors
Matthew Silk; Olivier Gimenez
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
Third Generation Simulation Data (TGSIM) I-395 Trajectories
catalog.data.gov
data.virginia.gov
+2more
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-395 Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-395-trajectories
Explore at:
Dataset updated
Aug 18, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Description
The main dataset is a 232 MB file of trajectory data (I395-final.csv) that contains position, speed, and acceleration data for non-automated passenger cars, trucks, buses, and automated vehicles on an expressway within an urban environment. Supporting files include an aerial reference image (I395_ref_image.png) and a list of polygon boundaries (I395_boundaries.csv) and associated images (I395_lane-1, I395_lane-2, …, I395_lane-6) stored in a folder titled “Annotation on Regions.zip” to map physical roadway segments to the numerical lane IDs referenced in the trajectory dataset. In the boundary file, columns “x1” to “x5” represent the horizontal pixel values in the reference image, with “x1” being the leftmost boundary line and “x5” being the rightmost boundary line, while the column "y" represents corresponding vertical pixel values. The origin point of the reference image is located at the top left corner. The dataset defines five lanes with five boundaries. Lane -6 corresponds to the area to the left of “x1”. Lane -5 corresponds to the area between “x1” and “x2”, and so forth to the rightmost lane, which is defined by the area to the right of “x5” (Lane -2). Lane -1 refers to vehicles that go onto the shoulder of the merging lane (Lane -2), which are manually separated by watching the videos. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which was one of the six collected as part of the TGSIM project, contains data collected from six 4K cameras mounted on tripods, positioned on three overpasses along I-395 in Washington, D.C. The cameras captured distinct segments of the highway, and their combined overlapping and non-overlapping footage resulted in a continuous trajectory for the entire section covering 0.5 km. This section covers a major weaving/mandatory lane-changing between L'Enfant Plaza and 4th Street SW, with three lanes in the eastbound direction and a major on-ramp on the left side. In addition to the on-ramp, the section covers an off-ramp on the right side. The expressway includes one diverging lane at the beginning of the section on the right side and one merging lane in the middle of the section on the left side. For the purposes of data extraction, the shoulder of the merging lane is also considered a travel lane since some vehicles illegally use it as an extended on-ramp to pass other drivers (see I395_ref_image.png for details). The cameras captured continuous footage during the morning rush hour (8:30 AM-10:30 AM ET) on a sunny day. During this period, vehicles equipped with SAE Level 2 automation were deployed to travel through the designated section to capture the impact of SAE Level 2-equipped vehicles on adjacent vehicles and their behavior in congested areas, particularly in complex merging sections. These vehicles are indicated in the dataset. As part of this dataset, the following files were provided: I395-final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I395_ref_image.png is the aerial reference image that defines the geographic region and the associated roadway segments. I395_boundaries.csv contains the coordinates that define the roadway segments (n=X). The columns "x1" to "x5" represent the horizontal pi
Simulated Hospital Admissions
kaggle.com
zip
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
River | Datasets for SQL Practice (2025). Simulated Hospital Admissions [Dataset]. https://www.kaggle.com/datasets/rivalytics/hospital-patient-dataset
Explore at:
zip(11450 bytes)Available download formats
Dataset updated
Jul 7, 2025
Authors
River | Datasets for SQL Practice
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset simulates hospital inpatient admissions, modeling patient conditions, departmental assignments, care severity, and discharge outcomes. It follows a contextual generation flow, meaning later values like severity, length of stay, and readmission are dependent on earlier values such as condition type and age. Each row represents a unique patient encounter with story-driven logic embedded into the data.

The inspiration came from real-world Electronic Health Record (EHR) systems where every patient record is shaped by context: diagnosis, age, treatment environment, and outcomes. The goal was to create a synthetic dataset that:

Feels realistic

Honors cause-and-effect logic

Can be used for ML modeling, dashboarding, and storytelling
d
Reverse Osmosis Simulation Data
catalog.data.gov
data.openei.org
+1more
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Lab - NREL (2025). Reverse Osmosis Simulation Data [Dataset]. https://catalog.data.gov/dataset/reverse-osmosis-simulation-data-bab13
Explore at:
Dataset updated
Aug 13, 2025
Dataset provided by
National Renewable Energy Lab - NREL
Description
This dataset consists of computational fluid dynamics (CFD) output for various spacer configurations in a feed-water channel in reverse osmosis (RO) applications. Feed-water channels transport brine solution to the RO membrane surfaces. The spacers embedded in the channels help improve membrane performance by disrupting the concentration boundary layer growth on membrane surfaces. Refer to the "Related Work" resource below for more details. This dataset considers a feed-water channel of length 150mm. The inlet brine velocity and concentration are fixed at 0.1m/s and 100kg/m3 respectively. The diameter of the cylindrical spacers is fixed as 0.3mm and six varying inter-spacer distances of 0.75mm, 1mm, 1.5mm, 2mm, 2.5mm, and 3mm are simulated. The dataset comprising the steady, spatial fields of solute concentration, velocity, and density near each spacer is placed in the folder corresponding to the spacer configuration considered. We run two sets of CFD simulations and include the outputs from both sets for each configuration: (1) with a coarser mesh, producing low-resolution (LR) data of spatial resolution 20x20, and (2) with a finer mesh, producing high-resolution (HR) data of spatial resolution 100x100. These data points can be treated as images with the quantities of interest as their channels and can be used to train machine learning models to learn a mapping from the LR images as inputs to the HR images as outputs.
H
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...
dataverse.harvard.edu
dataone.org
Updated Jul 6, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6C3JR1
Dataset updated
Jul 6, 2017
Dataset provided by
Harvard Dataverse
Authors
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
Description
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
d
LAMMPS Simulation Data of Alchemical Processes
catalog.data.gov
nist.gov
+1more
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). LAMMPS Simulation Data of Alchemical Processes [Dataset]. https://catalog.data.gov/dataset/lammps-simulation-data-of-alchemical-processes
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technology
Description
This data includes a four alchemical processes with data files generated with the python package: generate_alchemical_lammps (DOI from MIDAS) and the resulting output to be used for the calculation of free energies using Multi-state Bennett Acceptance Ratio (MBAR), BAR, or Thermodynamic Integration (TI). These input files are only applicable for LAMMPS versions after April 2024. The four cases can be separated into two systems, benzene solvated in water, and a Lennard-Jones (LJ) dimer in solvent. These four cases are:1) benzene 1: In the NPT ensemble, scale the charges of benzene atoms from full values to zero over six steps.2) benzene 2: In the NPT ensemble, scale the van der Waals potential between benzene and water from full values to zero over sixteen steps.3) benzene 3: In the NVT ensemble with benzene in vacuum, scale the charges of benzene's atoms from zero to full values over six steps.4) lj_dimer: In the NPT ensemble, change the cross interaction energy parameter between solvent and dimer from a value of 1 to 2.
Z
Simulation Data & R scripts for: "Introducing recurrent events analyses to...
data.niaid.nih.gov
doi.org
+1more
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
Explore at:
Dataset updated
Apr 29, 2024
Dataset provided by
Department of National Park Monitoring and Animal Management, Bavarian Forest National Park
Authors
Ferry, Nicolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Files descriptions:

All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Computational Fluid Dynamics
kaggle.com
zip
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allanatrix (2025). Computational Fluid Dynamics [Dataset]. https://www.kaggle.com/datasets/allanwandia/computational-fluid-dynamics
Explore at:
zip(848797 bytes)Available download formats
Dataset updated
Apr 4, 2025
Authors
Allanatrix
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains simulated flow data based on the Navier-Stokes equations, designed specifically for training Physics-Informed Neural Networks (PINNs) to model fluid dynamics in a 2D channel. It includes 10,000 rows of data, split evenly between 5,000 rows of laminar flow and 5,000 rows of turbulent flow. Each row represents a point in a spatial-temporal grid with velocity components (u, v), pressure (p), and their spatial and temporal derivatives, ensuring all velocity values are non-zero for robust machine learning applications.

The dataset is generated from a simplified simulation:

Laminar Flow: Based on the analytical Poiseuille flow solution with added noise to avoid zero transverse velocities. Turbulent Flow: Perturbed Poiseuille flow evolved with a basic Navier-Stokes solver and random noise to mimic turbulence. This dataset is ideal for researchers and data scientists working on PINNs, offering a balance between size (compact at 10,000 rows) and representativeness of fluid dynamics phenomena.
T
Simulation and Test Data Management Market Analysis - Size, Share, and...
futuremarketinsights.com
html, pdf
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudip Saha (2025). Simulation and Test Data Management Market Analysis - Size, Share, and Forecast 2025 to 2035 [Dataset]. https://www.futuremarketinsights.com/reports/simulation-and-test-data-management-market
Explore at:
pdf, htmlAvailable download formats
Dataset updated
Jun 3, 2025
Authors
Sudip Saha
License
https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy
Time period covered
2025 - 2035
Area covered
Worldwide
Description
The global simulation and test data management market is expected to witness substantial growth, with its valuation projected to increase from approximately USD 905.2 million in 2025 to about USD 3.24 billion by 2035. This corresponds to a CAGR of 12.1% over the forecast period.

Attributes  Description
Industry Size (2025E) USD 905.2 million
Industry Size (2035F) USD 3.24 billion
CAGR (2025 to 2035) 12.1% CAGR

Category-wise Insights

Segment CAGR (2025 to 2035)
Aerospace & Defense (Industry) 14.8%
Segment Value Share ( 2025 )
Test Data Simulation Software (Solution) 42.3%
Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories
catalog.data.gov
data.virginia.gov
+2more
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-90-i-94-moving-trajectories
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Area covered
Interstate 90, Interstate 94, Interstate 90
Description
The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day. As part of this dataset, the following files were provided: I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X. I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes. Annotation on Regions.zip, which includes images that visually map lanes (I90_9
d
Data from: Computational Fluid Dynamics Simulation of Oscylator cylinders
catalog.data.gov
data.openei.org
+2more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vortex Hydro Energy (2025). Computational Fluid Dynamics Simulation of Oscylator cylinders [Dataset]. https://catalog.data.gov/dataset/computational-fluid-dynamics-simulation-of-oscylator-cylinders-f8ed6
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Vortex Hydro Energy
Description
This is one of the computational fluid dynamics (CFD) simulations. The parameters for the test are in the info.txt file.
Call Center Simulated Data
kaggle.com
zip
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Sebastián Campos Ortiz (2023). Call Center Simulated Data [Dataset]. https://www.kaggle.com/datasets/scss17/call-center-simulated-data
Explore at:
zip(3098 bytes)Available download formats
Dataset updated
Mar 28, 2023
Authors
Pablo Sebastián Campos Ortiz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.

A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.
i
Data: Simulation Results of Human Heart And Arteries
ieee-dataport.org
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Bansal (2024). Data: Simulation Results of Human Heart And Arteries [Dataset]. https://ieee-dataport.org/documents/data-simulation-results-human-heart-and-arteries
Explore at:
Dataset updated
Dec 19, 2024
Authors
Abhishek Bansal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
I have only explored more possibilites in the mathematical calculations and design.
C-MAPSS Aircraft Engine Simulator Data - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Sep 22, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2010). C-MAPSS Aircraft Engine Simulator Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/c-mapss-aircraft-engine-simulator-data
Explore at:
Dataset updated
Sep 22, 2010
Dataset provided by
NASAhttp://nasa.gov/
Description
SPECIAL NOTE: C-MAPSS and C-MAPSS40K ARE CURRENTLY UNAVAILABLE FOR DOWNLOAD. Glenn Research Center management is reviewing the availability requirements for these software packages. We are working with Center management to get the review completed and issues resolved in a timely manner. We will post updates on this website when the issues are resolved. We apologize for any inconvenience. Please contact Jonathan Litt, jonathan.s.litt@nasa.gov, if you have any questions in the meantime. Subject Area: Engine Health Description: This data set was generated with the C-MAPSS simulator. C-MAPSS stands for 'Commercial Modular Aero-Propulsion System Simulation' and it is a tool for the simulation of realistic large commercial turbofan engine data. Each flight is a combination of a series of flight conditions with a reasonable linear transition period to allow the engine to change from one flight condition to the next. The flight conditions are arranged to cover a typical ascent from sea level to 35K ft and descent back down to sea level. The fault was injected at a given time in one of the flights and persists throughout the remaining flights, effectively increasing the age of the engine. The intent is to identify which flight and when in the flight the fault occurred. How Data Was Acquired: The data provided is from a high fidelity system level engine simulation designed to simulate nominal and fault engine degradation over a series of flights. The simulated data was created with a Matlab Simulink tool called C-MAPSS. Sample Rates and Parameter Description: The flights are full flight recordings sampled at 1 Hz and consist of 30 engine and flight condition parameters. Each flight contains 7 unique flight conditions for an approximately 90 min flight including ascent to cruise at 35K ft and descent back to sea level. The parameters for each flight are the flight conditions, health indicators, measurement temperatures and pressure measurements. Faults/Anomalies: Faults arose from the inlet engine fan, the low pressure compressor, the high pressure compressor, the high pressure turbine and the low pressure turbine.
Employee Data Simulation: IT Industry
kaggle.com
zip
Updated Jul 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhay Ayare (2024). Employee Data Simulation: IT Industry [Dataset]. https://www.kaggle.com/datasets/abhayayare/employee-data-simulation-it-industry/code
Explore at:
zip(3988 bytes)Available download formats
Dataset updated
Jul 21, 2024
Authors
Abhay Ayare
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset, named employee_data.csv, contains simulated data of 400 employees working in various IT-related positions. The data includes details about each employee's gender, years of experience, position, and salary. The dataset aims to reflect realistic distributions and variations within the IT industry, particularly how salaries tend to increase with experience and the specific job role. This dataset was generated using the Faker library in Python, which allows for the creation of realistic fake data for various applications.

1) ID: A unique identifier for each employee (1 to 400). 2) Gender: The gender of the employee. The values are either 'M' (Male) or 'F' (Female). 3) Experience (Years): The number of years of professional experience the employee has, ranging from 0 to 20 years. 4) Position: The job title of the employee. The positions included in the dataset are: - IT Manager - Software Engineer - Network Administrator - Systems Administrator - Database Administrator (DBA) - Web Developer - IT Support Specialist - Systems Analyst - IT Security Analyst - DevOps Engineer - Cloud Solutions Architect 5) Salary: The annual salary of the employee in USD. The salary is generated to reflect realistic compensation within the IT industry and increases with both the position and years of experience.

Sample Data:

ID Gender Experience (Years) Position Salary
1 M 5 Software Engineer 84,000
2 F 10 IT Manager 135,000
3 M 7 Network Administrator 85,000
4 F 15 Cloud Solutions Architect 147,000
5 M 2 Web Developer 60,000

Applications

The dataset can be used for various purposes, including:

Data Analysis: Analyzing salary trends based on position and experience.

Machine Learning: Training models for salary prediction.

Human Resources: Understanding compensation structures in the IT industry.

Education: Teaching purposes in data science and data analysis courses.

Attributes	Description
Industry Size (2025E)	USD 905.2 million
Industry Size (2035F)	USD 3.24 billion
CAGR (2025 to 2035)	12.1% CAGR

Segment	CAGR (2025 to 2035)
Aerospace & Defense (Industry)	14.8%
Segment	Value Share ( 2025 )
Test Data Simulation Software (Solution)	42.3%

ID	Gender	Experience (Years)	Position	Salary
1	M	5	Software Engineer	84,000
2	F	10	IT Manager	135,000
3	M	7	Network Administrator	85,000
4	F	15	Cloud Solutions Architect	147,000
5	M	2	Web Developer	60,000

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set

Simulation Data Set

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Clear search

Close search

Google apps

Main menu

Simulation Data Set

Simulation data and code

Call Centre Queue Simulation

Call Centre Queue Simulation

Discrete-Event Simulation

data-drift-simulation-dataset

Simulation data.

Data and code for: Generation and applications of simulated datasets to...

Third Generation Simulation Data (TGSIM) I-395 Trajectories

Simulated Hospital Admissions

Reverse Osmosis Simulation Data

Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

LAMMPS Simulation Data of Alchemical Processes

Simulation Data & R scripts for: "Introducing recurrent events analyses to...

Computational Fluid Dynamics

Simulation and Test Data Management Market Analysis - Size, Share, and...

Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

Data from: Computational Fluid Dynamics Simulation of Oscylator cylinders

Call Center Simulated Data

Data: Simulation Results of Human Heart And Arteries

C-MAPSS Aircraft Engine Simulator Data - Dataset - NASA Open Data Portal

Employee Data Simulation: IT Industry

Simulation Data Set