10 datasets found

f
Understanding and Managing Missing Data.pdf
figshare.com
pdf
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29265155.v1
Dataset updated
Jun 9, 2025
Dataset provided by
figshare
Authors
Ibrahim Denis Fofanah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.
f
Additional file 4 of Heckman imputation models for binary or continuous MNAR...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacques-Emmanuel Galimard; Sylvie Chevret; Emmanuel Curis; Matthieu Resche-Rigon (2023). Additional file 4 of Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors [Dataset]. http://doi.org/10.6084/m9.figshare.7038104.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7038104.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Jacques-Emmanuel Galimard; Sylvie Chevret; Emmanuel Curis; Matthieu Resche-Rigon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R code to impute binary outcome. (R 1 kb)
f
Data from: Performance of standard imputation methods for missing quality of...
tandf.figshare.com
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marion Procter; Chris Robertson (2023). Performance of standard imputation methods for missing quality of life data as covariate in survival analysis based on simulations from the International Breast Cancer Study Group Trials VI and VII* [Dataset]. http://doi.org/10.6084/m9.figshare.6960167.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6960167.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
Marion Procter; Chris Robertson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Imputation methods for missing data on a time-dependent variable within time-dependent Cox models are investigated in a simulation study. Quality of life (QoL) assessments were removed from the complete simulated datasets, which have a positive relationship between QoL and disease-free survival (DFS) and delayed chemotherapy and DFS, by missing at random and missing not at random (MNAR) mechanisms. Standard imputation methods were applied before analysis. Method performance was influenced by missing data mechanism, with one exception for simple imputation. The greatest bias occurred under MNAR and large effect sizes. It is important to carefully investigate the missing data mechanism.
f
MAPE and PB statistics for IBFI compared with other imputation methods...
plos.figshare.com
xls
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR). [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0262131.t003
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR).
e
ComBat HarmonizR enables the integrated analysis of independently generated...
ebi.ac.uk
omicsdi.org
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Voß (2022). ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD027467
Explore at:
Dataset updated
May 23, 2022
Authors
Hannah Voß
Variables measured
Proteomics
Description
The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
f
Statistical details of the SRGC time series dataset.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). Statistical details of the SRGC time series dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0262131.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical details of the SRGC time series dataset.
d
Replication Data for: Countering Non-Ignorable Nonresponse in Survey Models...
search.dataone.org
dataverse.harvard.edu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bailey, Michael (2024). Replication Data for: Countering Non-Ignorable Nonresponse in Survey Models with Randomized Response Instruments and Doubly Robust Estimation [Dataset]. http://doi.org/10.7910/DVN/L2NVRD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/L2NVRD
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Bailey, Michael
Description
Conventional survey tools such as weighting do not address non-ignorable nonresponse that occurs when nonresponse depends on the variable being measured. This paper describes non-ignorable nonresponse weighting and imputation models using randomized response instruments, which are variables that affect response but not the outcome of interest \citep{SunEtal2018}. The paper uses a doubly robust estimator that is valid if one, but not necessarily both, of the weighting and imputation models is correct. When applied to a national 2019 survey, these tools produce estimates that suggest there was non-trivial non-ignorable nonresponse related to turnout, and, for subgroups, Trump approval and policy questions. For example, the conventional MAR-based weighted estimates of Trump support in the Midwest were 10 percentage points lower than the MNAR-based estimates. Data to replicate estimation described in "Countering Non-Ignorable Nonresponse in Survey Models with Randomized Response Instruments and Doubly Robust Estimation"
f
Data provide.
plos.figshare.com
figshare.com
xlsx
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaozhou Li; Zhenyu Yang; Chuanji Yuan; Jiaqing Liu; Zuojing Li (2025). Data provide. [Dataset]. http://doi.org/10.1371/journal.pone.0323496.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0323496.s001
Dataset updated
May 29, 2025
Dataset provided by
PLOS ONE
Authors
Xiaozhou Li; Zhenyu Yang; Chuanji Yuan; Jiaqing Liu; Zuojing Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Currently, various methods have been proposed to handle missing data in clinical trials. Some methods assume that the missing data are missing at random (MAR), which means that it is assumed that subjects who stopped treatment would still maintain the treatment effect. In many cases, however, researchers often assume that the missing data are missing not at random (MNAR) to conduct additional sensitivity analyses. Under the MNAR assumption, whether using some conservative imputation methods such as RTB (return to baseline) method, J2R (jump to reference) method, and CR (copy reference) method, or optimistic imputation methods like multiple imputation (MI) and its derivative RD (retrieved dropout) method, biases compared to the true treatment effect can occur in some scenarios. This paper aims to propose a method that can impute results while considering the occurrence of intercurrent events, thereby reducing the bias compared to the true treatment effect. This method combines the RD method with the RTB formula, reducing the biases and standard errors associated with using either method alone. Considering the differing treatment effects between RD subjects and non-RD subjects, our imputation results often align more closely with the true drug efficacy.
f
Summary statistics for the complete-case data, original data, and original...
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz (2023). Summary statistics for the complete-case data, original data, and original data with imputed values. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010154.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010154.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS Computational Biology
Authors
Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary statistics for the complete-case data, original data, and original data with imputed values.
f
Estimated overall proportion of clustering for observed, incomplete-observed...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shelley H. Liu; Gabriel Erion; Vladimir Novitsky; Victor De Gruttola (2023). Estimated overall proportion of clustering for observed, incomplete-observed and imputed-complete datasets under MNAR. [Dataset]. http://doi.org/10.1371/journal.pone.0135469.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0135469.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Shelley H. Liu; Gabriel Erion; Vladimir Novitsky; Victor De Gruttola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering assessed at the 0.10 threshold, with m = 100 deletions. Clus stands for clustering; Obs for observed data; In-Obs for ncomplete-observed data; Impute-Com for imputed-complete data; Cov stands for coverage.Estimated overall proportion of clustering for observed, incomplete-observed and imputed-complete datasets under MNAR.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1

Understanding and Managing Missing Data.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.29265155.v1

Dataset updated

Jun 9, 2025

Dataset provided by

figshare

Authors

Ibrahim Denis Fofanah

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

Clear search

Close search

Google apps

Main menu

Understanding and Managing Missing Data.pdf

Additional file 4 of Heckman imputation models for binary or continuous MNAR...

Data from: Performance of standard imputation methods for missing quality of...

MAPE and PB statistics for IBFI compared with other imputation methods...

ComBat HarmonizR enables the integrated analysis of independently generated...

Statistical details of the SRGC time series dataset.

Replication Data for: Countering Non-Ignorable Nonresponse in Survey Models...

Data provide.

Summary statistics for the complete-case data, original data, and original...

Estimated overall proportion of clustering for observed, incomplete-observed...

Understanding and Managing Missing Data.pdf