Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.
This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.
The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.
The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.
Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.
The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.
Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.
For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.
Facebook
Twittersdiazlor/data-drift-simulation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThe data sets contain the parameters and configurations for the simulation defined in the scenario file as well as the simulation outputs for all numerical experiments presented in this contribution. In addition, the scripts for the evaluation of the results are provided. (ZIP)
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
Facebook
TwitterThe main dataset is a 232 MB file of trajectory data (I395-final.csv) that contains position, speed, and acceleration data for non-automated passenger cars, trucks, buses, and automated vehicles on an expressway within an urban environment. Supporting files include an aerial reference image (I395_ref_image.png) and a list of polygon boundaries (I395_boundaries.csv) and associated images (I395_lane-1, I395_lane-2, …, I395_lane-6) stored in a folder titled “Annotation on Regions.zip” to map physical roadway segments to the numerical lane IDs referenced in the trajectory dataset. In the boundary file, columns “x1” to “x5” represent the horizontal pixel values in the reference image, with “x1” being the leftmost boundary line and “x5” being the rightmost boundary line, while the column "y" represents corresponding vertical pixel values. The origin point of the reference image is located at the top left corner. The dataset defines five lanes with five boundaries. Lane -6 corresponds to the area to the left of “x1”. Lane -5 corresponds to the area between “x1” and “x2”, and so forth to the rightmost lane, which is defined by the area to the right of “x5” (Lane -2). Lane -1 refers to vehicles that go onto the shoulder of the merging lane (Lane -2), which are manually separated by watching the videos. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which was one of the six collected as part of the TGSIM project, contains data collected from six 4K cameras mounted on tripods, positioned on three overpasses along I-395 in Washington, D.C. The cameras captured distinct segments of the highway, and their combined overlapping and non-overlapping footage resulted in a continuous trajectory for the entire section covering 0.5 km. This section covers a major weaving/mandatory lane-changing between L'Enfant Plaza and 4th Street SW, with three lanes in the eastbound direction and a major on-ramp on the left side. In addition to the on-ramp, the section covers an off-ramp on the right side. The expressway includes one diverging lane at the beginning of the section on the right side and one merging lane in the middle of the section on the left side. For the purposes of data extraction, the shoulder of the merging lane is also considered a travel lane since some vehicles illegally use it as an extended on-ramp to pass other drivers (see I395_ref_image.png for details). The cameras captured continuous footage during the morning rush hour (8:30 AM-10:30 AM ET) on a sunny day. During this period, vehicles equipped with SAE Level 2 automation were deployed to travel through the designated section to capture the impact of SAE Level 2-equipped vehicles on adjacent vehicles and their behavior in congested areas, particularly in complex merging sections. These vehicles are indicated in the dataset. As part of this dataset, the following files were provided: I395-final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I395_ref_image.png is the aerial reference image that defines the geographic region and the associated roadway segments. I395_boundaries.csv contains the coordinates that define the roadway segments (n=X). The columns "x1" to "x5" represent the horizontal pi
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset simulates hospital inpatient admissions, modeling patient conditions, departmental assignments, care severity, and discharge outcomes. It follows a contextual generation flow, meaning later values like severity, length of stay, and readmission are dependent on earlier values such as condition type and age. Each row represents a unique patient encounter with story-driven logic embedded into the data.
The inspiration came from real-world Electronic Health Record (EHR) systems where every patient record is shaped by context: diagnosis, age, treatment environment, and outcomes. The goal was to create a synthetic dataset that:
Facebook
TwitterThis dataset consists of computational fluid dynamics (CFD) output for various spacer configurations in a feed-water channel in reverse osmosis (RO) applications. Feed-water channels transport brine solution to the RO membrane surfaces. The spacers embedded in the channels help improve membrane performance by disrupting the concentration boundary layer growth on membrane surfaces. Refer to the "Related Work" resource below for more details. This dataset considers a feed-water channel of length 150mm. The inlet brine velocity and concentration are fixed at 0.1m/s and 100kg/m3 respectively. The diameter of the cylindrical spacers is fixed as 0.3mm and six varying inter-spacer distances of 0.75mm, 1mm, 1.5mm, 2mm, 2.5mm, and 3mm are simulated. The dataset comprising the steady, spatial fields of solute concentration, velocity, and density near each spacer is placed in the folder corresponding to the spacer configuration considered. We run two sets of CFD simulations and include the outputs from both sets for each configuration: (1) with a coarser mesh, producing low-resolution (LR) data of spatial resolution 20x20, and (2) with a finer mesh, producing high-resolution (HR) data of spatial resolution 100x100. These data points can be treated as images with the quantities of interest as their channels and can be used to train machine learning models to learn a mapping from the LR images as inputs to the HR images as outputs.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
Facebook
TwitterThis data includes a four alchemical processes with data files generated with the python package: generate_alchemical_lammps (DOI from MIDAS) and the resulting output to be used for the calculation of free energies using Multi-state Bennett Acceptance Ratio (MBAR), BAR, or Thermodynamic Integration (TI). These input files are only applicable for LAMMPS versions after April 2024. The four cases can be separated into two systems, benzene solvated in water, and a Lennard-Jones (LJ) dimer in solvent. These four cases are:1) benzene 1: In the NPT ensemble, scale the charges of benzene atoms from full values to zero over six steps.2) benzene 2: In the NPT ensemble, scale the van der Waals potential between benzene and water from full values to zero over sixteen steps.3) benzene 3: In the NVT ensemble with benzene in vacuum, scale the charges of benzene's atoms from zero to full values over six steps.4) lj_dimer: In the NPT ensemble, change the cross interaction energy parameter between solvent and dimer from a value of 1 to 2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files descriptions:
All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.
ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains simulated flow data based on the Navier-Stokes equations, designed specifically for training Physics-Informed Neural Networks (PINNs) to model fluid dynamics in a 2D channel. It includes 10,000 rows of data, split evenly between 5,000 rows of laminar flow and 5,000 rows of turbulent flow. Each row represents a point in a spatial-temporal grid with velocity components (u, v), pressure (p), and their spatial and temporal derivatives, ensuring all velocity values are non-zero for robust machine learning applications.
The dataset is generated from a simplified simulation:
Laminar Flow: Based on the analytical Poiseuille flow solution with added noise to avoid zero transverse velocities. Turbulent Flow: Perturbed Poiseuille flow evolved with a basic Navier-Stokes solver and random noise to mimic turbulence. This dataset is ideal for researchers and data scientists working on PINNs, offering a balance between size (compact at 10,000 rows) and representativeness of fluid dynamics phenomena.
Facebook
Twitterhttps://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy
The global simulation and test data management market is expected to witness substantial growth, with its valuation projected to increase from approximately USD 905.2 million in 2025 to about USD 3.24 billion by 2035. This corresponds to a CAGR of 12.1% over the forecast period.
| Attributes | Description |
|---|---|
| Industry Size (2025E) | USD 905.2 million |
| Industry Size (2035F) | USD 3.24 billion |
| CAGR (2025 to 2035) | 12.1% CAGR |
Category-wise Insights
| Segment | CAGR (2025 to 2035) |
|---|---|
| Aerospace & Defense (Industry) | 14.8% |
| Segment | Value Share ( 2025 ) |
| Test Data Simulation Software (Solution) | 42.3% |
Facebook
TwitterThe main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day. As part of this dataset, the following files were provided: I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X. I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes. Annotation on Regions.zip, which includes images that visually map lanes (I90_9
Facebook
TwitterThis is one of the computational fluid dynamics (CFD) simulations. The parameters for the test are in the info.txt file.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.
A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
I have only explored more possibilites in the mathematical calculations and design.
Facebook
TwitterSPECIAL NOTE: C-MAPSS and C-MAPSS40K ARE CURRENTLY UNAVAILABLE FOR DOWNLOAD. Glenn Research Center management is reviewing the availability requirements for these software packages. We are working with Center management to get the review completed and issues resolved in a timely manner. We will post updates on this website when the issues are resolved. We apologize for any inconvenience. Please contact Jonathan Litt, jonathan.s.litt@nasa.gov, if you have any questions in the meantime. Subject Area: Engine Health Description: This data set was generated with the C-MAPSS simulator. C-MAPSS stands for 'Commercial Modular Aero-Propulsion System Simulation' and it is a tool for the simulation of realistic large commercial turbofan engine data. Each flight is a combination of a series of flight conditions with a reasonable linear transition period to allow the engine to change from one flight condition to the next. The flight conditions are arranged to cover a typical ascent from sea level to 35K ft and descent back down to sea level. The fault was injected at a given time in one of the flights and persists throughout the remaining flights, effectively increasing the age of the engine. The intent is to identify which flight and when in the flight the fault occurred. How Data Was Acquired: The data provided is from a high fidelity system level engine simulation designed to simulate nominal and fault engine degradation over a series of flights. The simulated data was created with a Matlab Simulink tool called C-MAPSS. Sample Rates and Parameter Description: The flights are full flight recordings sampled at 1 Hz and consist of 30 engine and flight condition parameters. Each flight contains 7 unique flight conditions for an approximately 90 min flight including ascent to cruise at 35K ft and descent back to sea level. The parameters for each flight are the flight conditions, health indicators, measurement temperatures and pressure measurements. Faults/Anomalies: Faults arose from the inlet engine fan, the low pressure compressor, the high pressure compressor, the high pressure turbine and the low pressure turbine.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset, named employee_data.csv, contains simulated data of 400 employees working in various IT-related positions. The data includes details about each employee's gender, years of experience, position, and salary. The dataset aims to reflect realistic distributions and variations within the IT industry, particularly how salaries tend to increase with experience and the specific job role. This dataset was generated using the Faker library in Python, which allows for the creation of realistic fake data for various applications.
1) ID: A unique identifier for each employee (1 to 400). 2) Gender: The gender of the employee. The values are either 'M' (Male) or 'F' (Female). 3) Experience (Years): The number of years of professional experience the employee has, ranging from 0 to 20 years. 4) Position: The job title of the employee. The positions included in the dataset are: - IT Manager - Software Engineer - Network Administrator - Systems Administrator - Database Administrator (DBA) - Web Developer - IT Support Specialist - Systems Analyst - IT Security Analyst - DevOps Engineer - Cloud Solutions Architect 5) Salary: The annual salary of the employee in USD. The salary is generated to reflect realistic compensation within the IT industry and increases with both the position and years of experience.
Sample Data:
| ID | Gender | Experience (Years) | Position | Salary |
|---|---|---|---|---|
| 1 | M | 5 | Software Engineer | 84,000 |
| 2 | F | 10 | IT Manager | 135,000 |
| 3 | M | 7 | Network Administrator | 85,000 |
| 4 | F | 15 | Cloud Solutions Architect | 147,000 |
| 5 | M | 2 | Web Developer | 60,000 |
Applications
The dataset can be used for various purposes, including:
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).