Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A simulated call centre dataset and notebook, designed to be used as a classroom / tutorial dataset for Business and Operations Analytics.
This notebook details the creation of simulated call centre logs over the course of one year. For this dataset we are imagining a business whose lines are open from 8:00am to 6:00pm, Monday to Friday. Four agents are on duty at any given time and each call takes an average of 5 minutes to resolve.
The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytics expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.
The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.
Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.
The simulations in this dataset are performed using the package simmer (Ucar et al., 2019). I encourage you to visit their website for complete details and fantastic tutorials on Discrete-Event Simulation.
Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software, 90(2), 1–30.
For source code and simulation details, view the cross-posted GitHub notebook and Shiny app.
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
"Some call it a marvel of technology. Some call it a fad. Self-driving cars are constantly making the headlines. These vehicles, designed to carry passengers from point A to B without a human manoeuvre, are promised to bring greater mobility, reduce street congestion and fuel consumption, and create safer roads."
The data set contains images which ae captured by the simulation using three cameras front (which is over the windshield ), right (which shows the view from the right side) , left(which shows the image from left side).
It also contains data about brake (whether it was used or not) , throttle speed, and steering angle
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global crash data simulator market size reached USD 1.42 billion in 2024, reflecting a robust demand for advanced simulation technologies across industries. The market is projected to grow at a CAGR of 8.7% from 2025 to 2033, reaching a forecasted value of USD 3.01 billion by 2033. This impressive growth is primarily driven by the increasing emphasis on safety standards and regulatory compliance in sectors such as automotive, aerospace, and defense, as well as the rapid integration of digital technologies into crash analysis processes. The ongoing advancements in simulation software and hardware, coupled with the rising need for cost-effective and accurate crash testing solutions, are further propelling the expansion of the crash data simulator market worldwide.
A significant growth factor for the crash data simulator market is the automotive industry's relentless pursuit of enhanced vehicle safety. With the advent of autonomous vehicles, electric mobility, and stringent crashworthiness regulations, automotive manufacturers and OEMs are increasingly relying on simulation tools to optimize design and validate safety features before physical prototyping. Crash data simulators enable engineers to model diverse crash scenarios, analyze occupant safety, and predict structural deformations with high accuracy. This not only accelerates the product development lifecycle but also reduces costs associated with physical crash tests. As global road safety initiatives intensify and consumer awareness regarding vehicle safety rises, the adoption of crash data simulators in automotive R&D is expected to surge, fueling market growth throughout the forecast period.
Another critical driver is the technological evolution in simulation software and hardware. Modern crash data simulators leverage advanced computational techniques, such as finite element analysis (FEA), machine learning, and high-performance computing (HPC), to deliver detailed insights into crash dynamics. The integration of cloud-based platforms and digital twins further enhances the scalability and flexibility of simulation environments, allowing for real-time collaboration and data sharing among stakeholders. These technological advancements not only improve the accuracy and reliability of crash simulations but also enable organizations to conduct virtual testing for a broader range of scenarios, including those that are difficult or costly to replicate physically. As industries increasingly embrace digital transformation, the demand for sophisticated and user-friendly crash data simulation solutions is poised to escalate.
The expansion of crash data simulator applications beyond automotive, particularly in aerospace, defense, and railways, is also contributing to market growth. In aerospace and defense, crash data simulators are utilized to assess the structural integrity of aircraft and military vehicles under various impact conditions, ensuring compliance with rigorous safety standards. Similarly, the railway sector employs simulation tools to evaluate crashworthiness and passenger safety in train collisions and derailments. The versatility of crash data simulators in addressing diverse safety challenges across multiple industries underscores their growing significance as essential tools for risk assessment and regulatory compliance. As emerging economies invest in transportation infrastructure and safety modernization, the adoption of crash data simulators is anticipated to rise across new verticals.
Regionally, North America continues to dominate the crash data simulator market, attributed to the presence of leading automotive and aerospace manufacturers, stringent safety regulations, and significant investments in R&D. However, the Asia Pacific region is witnessing the fastest growth, driven by rapid industrialization, expanding automotive production, and increasing focus on transportation safety. Europe also maintains a strong market position, supported by robust regulatory frameworks and technological innovation. The Middle East & Africa and Latin America are gradually emerging as promising markets, as governments and industries in these regions prioritize safety and infrastructure development. Overall, the global crash data simulator market is characterized by dynamic regional trends and a growing emphasis on digital simulation as a cornerstone of safety engineering.
<
Facebook
TwitterSimantha is a discrete event simulation package written in Python that is designed to model the behavior of discrete manufacturing systems. Specifically, it focuses on asynchronous production lines with finite buffers. It also provides functionality for modeling the degradation and maintenance of machines in these systems. Classes for five basic manufacturing objects are included: source, machine, buffer, sink, and maintainer. These objects can be defined by the user and configured in different ways to model various real-world manufacturing systems. The object classes are also designed to be extensible so that they can be used to model more complex processes.In addition to modeling the behavior of existing systems, Simantha is also intended for use with simulation-based optimization and planning applications. For instance, users may be interested in evaluating alternative maintenance policies for a particular system. Estimating the expected system performance under each candidate policy will require a large number of simulation replications when the system is subject to a high degree of stochasticity. Simantha therefore supports parallel simulation replications to make this procedure more efficient.Github repository: https://github.com/usnistgov/simantha
Facebook
Twitter
According to our latest research, the global Crash Data Simulator market size reached USD 1.28 billion in 2024, reflecting a robust and expanding industry. The market is expected to grow at a CAGR of 11.7% from 2025 to 2033, with the projected market size anticipated to reach USD 3.16 billion by 2033. This growth is primarily driven by the rising demand for advanced safety measures, stringent regulatory environments, and the increasing integration of simulation technologies across various industries. The adoption of crash data simulators is accelerating as organizations seek to enhance product safety, reduce development cycles, and comply with evolving global standards.
One of the primary growth factors for the Crash Data Simulator market is the automotive industry’s relentless pursuit of vehicle safety and innovation. As automotive manufacturers face mounting pressure to meet rigorous crashworthiness standards and consumer expectations, they are increasingly leveraging crash data simulation tools to optimize vehicle design and validate safety features. These simulators enable comprehensive virtual testing, reducing the need for costly and time-consuming physical crash tests. Furthermore, the rise of electric and autonomous vehicles has intensified the necessity for sophisticated simulation environments, as these vehicles present unique safety challenges and require extensive validation before market release. The integration of artificial intelligence and machine learning into crash data simulators further enhances their predictive accuracy, creating additional impetus for market expansion.
The aerospace and defense sectors are also significant contributors to the growth of the Crash Data Simulator market. In aerospace, the demand for lightweight materials and complex structural designs necessitates advanced simulation tools to assess crashworthiness and survivability under extreme conditions. Defense organizations utilize crash data simulators to evaluate the safety of military vehicles, aircraft, and personnel equipment, ensuring compliance with strict regulatory and operational requirements. The increasing frequency of joint research initiatives between government agencies and private enterprises is fostering technological advancements in simulation software and hardware, further propelling market growth. Additionally, the adoption of crash data simulators in industrial and research and development applications is expanding, as organizations across sectors recognize the value of predictive analytics in product development and quality assurance.
Another critical growth driver is the proliferation of cloud-based deployment models and scalable simulation services. As organizations strive for flexibility, collaboration, and cost efficiency, cloud-based crash data simulators are gaining traction. These platforms enable remote access, real-time data sharing, and integration with other enterprise systems, facilitating seamless workflow management and cross-functional collaboration. The shift towards cloud-based solutions is particularly pronounced among research institutes and testing laboratories, which benefit from reduced infrastructure costs and the ability to rapidly scale simulation capabilities based on project requirements. Moreover, advancements in high-performance computing and big data analytics are empowering organizations to conduct more complex simulations, analyze vast datasets, and derive actionable insights to enhance safety and performance outcomes.
From a regional perspective, North America continues to dominate the Crash Data Simulator market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The United States, in particular, is a hub for automotive innovation, aerospace engineering, and regulatory oversight, driving substantial investments in crash simulation technologies. Europe’s strong emphasis on vehicle safety standards and sustainability initiatives is fueling demand for advanced simulators, while Asia Pacific is emerging as a high-growth region due to rapid industrialization, expanding automotive manufacturing, and increasing government initiatives to enhance transportation safety. Latin America and the Middle East & Africa are also witnessing steady adoption, supported by growing investments in infrastructure and regulatory reforms aimed at improving safety standards across industries.
Facebook
TwitterMulti-Fidelity Simulator, MFSim is a pluggable framework for creating an air traffic flow simulator at multiple levels of fidelity. The framework is designed to allow low-fidelity simulations of the entire US Airspace to be completed very quickly (on the order of seconds). The framework allows higher-fidelity plugins to be added to allow higher-fidelity simulations to occur in certain regions of the airspace concurrently with the low-fidelity simulation of the full airspace.
Facebook
TwitterSPECIAL NOTE: C-MAPSS and C-MAPSS40K ARE CURRENTLY UNAVAILABLE FOR DOWNLOAD. Glenn Research Center management is reviewing the availability requirements for these software packages. We are working with Center management to get the review completed and issues resolved in a timely manner. We will post updates on this website when the issues are resolved. We apologize for any inconvenience. Please contact Jonathan Litt, jonathan.s.litt@nasa.gov, if you have any questions in the meantime. Subject Area: Engine Health Description: This data set was generated with the C-MAPSS simulator. C-MAPSS stands for 'Commercial Modular Aero-Propulsion System Simulation' and it is a tool for the simulation of realistic large commercial turbofan engine data. Each flight is a combination of a series of flight conditions with a reasonable linear transition period to allow the engine to change from one flight condition to the next. The flight conditions are arranged to cover a typical ascent from sea level to 35K ft and descent back down to sea level. The fault was injected at a given time in one of the flights and persists throughout the remaining flights, effectively increasing the age of the engine. The intent is to identify which flight and when in the flight the fault occurred. How Data Was Acquired: The data provided is from a high fidelity system level engine simulation designed to simulate nominal and fault engine degradation over a series of flights. The simulated data was created with a Matlab Simulink tool called C-MAPSS. Sample Rates and Parameter Description: The flights are full flight recordings sampled at 1 Hz and consist of 30 engine and flight condition parameters. Each flight contains 7 unique flight conditions for an approximately 90 min flight including ascent to cruise at 35K ft and descent back to sea level. The parameters for each flight are the flight conditions, health indicators, measurement temperatures and pressure measurements. Faults/Anomalies: Faults arose from the inlet engine fan, the low pressure compressor, the high pressure compressor, the high pressure turbine and the low pressure turbine.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
Facebook
Twittersdiazlor/data-drift-simulation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files descriptions:
All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.
ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data within consist of compressed output files in the form of edgelists (.edgelist.gz) and nodelists (.aux.parquet) from large citation network simulations using an agent-based model. The code and instructions are available at: https://github.com/illinois-or-research-analytics/SASCA. In addition, we provide a distribution of citation frequencies drawn from a random sample of PubMed journal articles (pooled_50k_pubmed_unique.csv) and a table of recencies- the frequency with which citations are made to the previous year, the year before that and so on (recency_probs_percent_stahl_filled.csv). A manuscript describing the SASCA-s simulator has been submitted for review and will be referenced in a future version of this data repository if it is accepted. The prefixes sj and er refer to the real world and Erdos-Renyi random graph respectively that were used to initiate simulations. These 'seed' networks are available from the Github site referenced above.
Facebook
Twitterhttps://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy
The global simulation and test data management market is expected to witness substantial growth, with its valuation projected to increase from approximately USD 905.2 million in 2025 to about USD 3.24 billion by 2035. This corresponds to a CAGR of 12.1% over the forecast period.
| Attributes | Description |
|---|---|
| Industry Size (2025E) | USD 905.2 million |
| Industry Size (2035F) | USD 3.24 billion |
| CAGR (2025 to 2035) | 12.1% CAGR |
Category-wise Insights
| Segment | CAGR (2025 to 2035) |
|---|---|
| Aerospace & Defense (Industry) | 14.8% |
| Segment | Value Share ( 2025 ) |
| Test Data Simulation Software (Solution) | 42.3% |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Farahnaz Amini
Released under CC0: Public Domain
Facebook
TwitterTo develop a simulation that collects both visual information, as well as grasp information about different objects using a multi-fingered hand. These sources of data can be used in the future to learn integrated object-action grasp representations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A simulated dataset that has been widely used in the evaluation of spike-sorting algorithms. Synthetic datasets are generated by adding spike waveform templates to background noise of various levels; this download contains several datasets, generated using different spike templates.Use wave_clus (see www2.le.ac.uk/centres/csn/software/wave-clus) for spike detection and sorting of this data. Wave_clus is a fast and unsupervised algorithm for spike detection and sorting compatible with Windows, Mac or Linux operating systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a simulated dataset of transient well production data. This dataset was used in my Masters thesis at King Abullah University of Science and Technology (KAUST), and it is shared for academic use and research work.
The dataset has 100 wells simulated at time steps of 0.2 hours for an entire year. This gives 43,800 observations per well, and grand total of 4,380,000 observations in the entire dataset. The resulting production data is then perturbed with systemic and random gauge errors to better simulate real-world gauge readings.
The simulator code used to generate this dataset can be found at: https://github.com/ykh-1992/TransientNodalAnalysis.jl
The data consists of three files: - "wells.csv": This file details the input parameters for each simulated well. - "data.zip": This file houses an 850 MB "data.csv" that includes the simulated well production data. - "auxiliary.csv": This file includes information related to the simulation run.
Facebook
TwitterThe main dataset is a 232 MB file of trajectory data (I395-final.csv) that contains position, speed, and acceleration data for non-automated passenger cars, trucks, buses, and automated vehicles on an expressway within an urban environment. Supporting files include an aerial reference image (I395_ref_image.png) and a list of polygon boundaries (I395_boundaries.csv) and associated images (I395_lane-1, I395_lane-2, …, I395_lane-6) stored in a folder titled “Annotation on Regions.zip” to map physical roadway segments to the numerical lane IDs referenced in the trajectory dataset. In the boundary file, columns “x1” to “x5” represent the horizontal pixel values in the reference image, with “x1” being the leftmost boundary line and “x5” being the rightmost boundary line, while the column "y" represents corresponding vertical pixel values. The origin point of the reference image is located at the top left corner. The dataset defines five lanes with five boundaries. Lane -6 corresponds to the area to the left of “x1”. Lane -5 corresponds to the area between “x1” and “x2”, and so forth to the rightmost lane, which is defined by the area to the right of “x5” (Lane -2). Lane -1 refers to vehicles that go onto the shoulder of the merging lane (Lane -2), which are manually separated by watching the videos. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which was one of the six collected as part of the TGSIM project, contains data collected from six 4K cameras mounted on tripods, positioned on three overpasses along I-395 in Washington, D.C. The cameras captured distinct segments of the highway, and their combined overlapping and non-overlapping footage resulted in a continuous trajectory for the entire section covering 0.5 km. This section covers a major weaving/mandatory lane-changing between L'Enfant Plaza and 4th Street SW, with three lanes in the eastbound direction and a major on-ramp on the left side. In addition to the on-ramp, the section covers an off-ramp on the right side. The expressway includes one diverging lane at the beginning of the section on the right side and one merging lane in the middle of the section on the left side. For the purposes of data extraction, the shoulder of the merging lane is also considered a travel lane since some vehicles illegally use it as an extended on-ramp to pass other drivers (see I395_ref_image.png for details). The cameras captured continuous footage during the morning rush hour (8:30 AM-10:30 AM ET) on a sunny day. During this period, vehicles equipped with SAE Level 2 automation were deployed to travel through the designated section to capture the impact of SAE Level 2-equipped vehicles on adjacent vehicles and their behavior in congested areas, particularly in complex merging sections. These vehicles are indicated in the dataset. As part of this dataset, the following files were provided: I395-final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I395_ref_image.png is the aerial reference image that defines the geographic region and the associated roadway segments. I395_boundaries.csv contains the coordinates that define the roadway segments (n=X). The columns "x1" to "x5" represent the horizontal pi
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.
A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).