Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This repository contains the data used to generate figures and tables in the unpublished paper "Connecting algorithmic fairness and fair outcomes in a sociotechnical simulation case study of AI-assisted healthcare". In this work, we present a simulation-based approach to explore how statistical definitions of algorithmic fairness translate to fairness in long-term outcomes, using AI-assisted breast cancer screening as a case example. We evaluate four fairness criteria and their impact on mortality rates and socioeconomic disparities, while also considering how radiologists’ reliance on AI and patients’ access to healthcare affect outcomes. Our results highlight how algorithmic fairness does not directly translate into fair and equitable outcomes, underscoring the importance of integrating sociotechnical perspectives in order to gain a holistic understanding of fairness in AI.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of the "Effects of initial values and convergence criterion in the two-parameter logistic model when estimating the latent distribution in BILOG-MG 3" simulation study, published in PLOS ONE.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip file includes the computer simulation code and datasets required for reproducing the study: Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets containing simulation performance results during the current study, in addition to the code to replicate the simulation study in its entirety, are available here. See the README file for a description the Stata do-files, R-script files, tips to run the code, and the performance result dataset dictionaries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of simulation study on smoothing algorithms on the performance of parallel analysis in factor analysis of ordered categorical items
Facebook
TwitterThis dataset contains files used in this Monte Carlo simulation study comparing the performance of five statistical models for adjusting design floods for current conditions at sites with known trends. These files include (i) the observed annual peak-flow series in the conterminous US used to inform ranges of known moments and trends used in the simulation experiment, (ii) the 3,000 combinations of Monte Carlo experiment parameters (including sample moments, trends, distribution types, and record lengths), (iii) the 5,000 100-year time series of random uniform variates used as annual non-exceedance probabilities in the generation of synthetic annual peak-flow series, (iv) the simulated and true (known) quantiles associated with the 10% and 1% annual exceedance probabilities conditioned on the last years of the synthetic annual peak-flow series generated through the experiment. This dataset also contains a model archive with the R statistical software code used to execute the study along with a document describing the contents of the archive and providing instructions for reproducing results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the simulated data and Stata code used to produce the results for the manuscript titled "Evaluating methods of outlier detection when benchmarking clinical registry data – a simulation study", accepted for publication in the Health Services and Outcomes Research Methodology Journal.
data_files.zip (code to generate all files in "do_files\simstudy1_preparation.do"):
raw_data - the .dta files produced from running the user-written hiersim command (https://doi.org/10.26180/24480889.v1)
summary_data - the .dta files produced from summarising of the results across each unique simulated scenario and method combination (performance measure average and 95% Monte Carlo confidence intervals)
parameter_check - the .dta files produced from summarising the simulated data parameters across each unique simulated scenario (performance measure average and 95% Monte Carlo confidence intervals)
do_files.zip:
simstudy1_preparation.do - the code to run the simulations (using the hiersim command, available at https://doi.org/10.26180/24480889.v1) and create summary datasets (performance measures and parameter checks)
simstudy1_manuscript.do - the code to produce the figures included in the main manuscript
simstudy1_supplementary.do - the code to produce the table and figures included in the manuscript supplementary material
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the entirety of the files generated during the large Monte Carlo simulation study on fit statistics in multilevel factor analysis.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This archive contains the simulated collections, their diagnosis data, and the estimates of accuracy. For the full code and description, please refer to https://github.com/julian-urbano/irj2015-reliability
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.
A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.
Facebook
TwitterBackground Many randomized trials involve measuring a continuous outcome - such as pain, body weight or blood pressure - at baseline and after treatment. In this paper, I compare four possibilities for how such trials can be analyzed: post-treatment; change between baseline and post-treatment; percentage change between baseline and post-treatment and analysis of covariance (ANCOVA) with baseline score as a covariate. The statistical power of each method was determined for a hypothetical randomized trial under a range of correlations between baseline and post-treatment scores. Results ANCOVA has the highest statistical power. Change from baseline has acceptable power when correlation between baseline and post-treatment scores is high;when correlation is low, analyzing only post-treatment scores has reasonable power. Percentage change from baseline has the lowest statistical power and was highly sensitive to changes in variance. Theoretical considerations suggest that percentage change from baseline will also fail to protect from bias in the case of baseline imbalance and will lead to an excess of trials with non-normally distributed outcome data. Conclusions Percentage change from baseline should not be used in statistical analysis. Trialists wishing to report this statistic should use another method, such as ANCOVA, and convert the results to a percentage change by using mean baseline scores.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the R code for the data generation and analysis for the paper:
Zhou, Z., Li, D., Huh, D., Xie, M., & Mun, E. Y. (2024). A Simulation Study of the Performance of Statistical Models for Count Outcomes with Excessive Zeros. Statistics in Medicine. https://doi.org/10.1002/sim.10198
Abstract
Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data. Methods: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (i.e., Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of candidate statistical models and approaches across data conditions that varied in sample size (N = 100 to 500), zero rate (0.2 to 0.8), and intervention effect sizes conditions. Results: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of linear model with a log-transformed outcome variable was unsatisfactory. When only one of the effects on the zero (vs. non-zero) part and the count part existed, the ZIP model had the highest statistical power. Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Données de simulations numériques permettant de reproduire les figures présentées dans l'article: Rise and fall of a multicomponent droplet in a surrounding fluid: Simulation study of a bumpy path Numerical simulation data needed to reproduce the figures presented in the article: «Rise and fall of a multicomponent droplet in a surrounding fluid: Simulation study of a bumpy path»
Facebook
TwitterDataset for the " Simulation Study of mmWave 5G-enabled Medical Extended Reality (MXR)" article's figure
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
It was carried out a simulation study that took into account the item properties of extremeness (difficulty, location) and consistency. The background idea is that a scale should be defined by a minimum of five items. In addition, averaged bias and sampling error of the five items were also inspected. Files included in the dataset: Data LINEAL: The items are analysed based on linear factor analysis; Data GRADED: The items are analysed based on no-linear factor analysis
Facebook
TwitterThis report describes a simulation study exploring weighted-precision and weighted-likelihood models (R, Stan and JAGS software) to recover unbiased estimates of population sizes from weighted survey data.
Facebook
TwitterFile List AllFunctions.R (MD5: d3120c7372ab36b802dd9f0c01138f1a) Functions needed to fit the models presented in the paper. AllFits.R (MD5: f7caa2c394d6c710195c1bca762d1851) R code for fitting the models to the data set of great tits. Simulations.R (MD5: d3a8a230b93a10bc74420f6694b6e8b1) R code for performing the simulation study presented in Appendix B. PMdata.csv (MD5: 6fa6abb6c8240ba8eb0100c8031182ae) The data set of already marked female great tits. PUdata.csv (MD5: 19f72c2afdc3736b87c8956dd9c94a3e) The data set of previously unmarked female great tits. Effort.csv (MD5: 95b7da2637fd9b08b9818832e23cb7f5) Data on sampling effort. WithBreeding.dll MD5: ffe47405123db8b78289af44f4e050d3) The log-likelihood in compiled C code. C.zip (MD5: 9452e29327a86ce7bc2ab968bdd720dc) The original C code containing the log-likelihood. Description All of the functions needed to fit the models presented in the paper are in AllFunctions.R. The log-likelihood function is evaluated in C via R and is contained in the WithBreeding.dll file. The original C file is in the folder C.zip. The analysis of the great tits data set (PMdata.csv is the data set of already marked birds and PUdata.csv is the data set of previously unmarked birds) presented in the paper was performed using the functions in AllFits.R while the simulation study presented in Appendix B was performed using Simulations.R. Information on capture and resight effort (number of sites visited etc) is in effort.csv.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The performance of statistical methods is frequently evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, available data- generating models are restricted to either inclusion of two-armed trials or the fixed-effect model. Based on data-generation in the pairwise case, we propose a framework for the simulation of random-effect network meta-analyses including multi-arm trials with binary outcome. The only of the common data-generating models which is directly applicable to a random-effects network setting uses strongly restrictive assumptions. To overcome these limitations, we modify this approach and derive a related simulation procedure using odds ratios as effect measure. The performance of this procedure is evaluated with synthetic data and in an empirical example.
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).