5 datasets found

f
Understanding and Managing Missing Data.pdf
figshare.com
pdf
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29265155.v1
Dataset updated
Jun 9, 2025
Dataset provided by
figshare
Authors
Ibrahim Denis Fofanah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.
f
MAPE and PB statistics for IBFI compared with other imputation methods...
plos.figshare.com
xls
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR). [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0262131.t003
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR).
Data from: A real data-driven simulation strategy to select an imputation...
zenodo.org
datadryad.org
bin
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacqueline A. May; Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz; Zeny Feng; Sarah J. Adamowicz (2023). Data from: A real data-driven simulation strategy to select an imputation method for mixed-type trait data [Dataset]. http://doi.org/10.5061/dryad.crjdfn37m
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.crjdfn37m
Dataset updated
Feb 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jacqueline A. May; Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz; Zeny Feng; Sarah J. Adamowicz
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Considering the mixed results of imputation, the wide variety of available methods, and the varied structure of real trait datasets, a framework for selecting a suitable imputation method is advantageous. We invoked a real data-driven simulation strategy to select an imputation method for a given mixed-type (categorical, count, continuous) target dataset. Candidate methods included mean/mode imputation, k-nearest neighbour, random forests, and multivariate imputation by chained equations (MICE). Using a trait dataset of squamates (lizards and amphisbaenians; order: Squamata) as a target dataset, a complete-case dataset consisting of species with nearly completed information was formed for the imputation method selection. Missing data were induced by removing values from this dataset under different missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). For each method, combinations with and without phylogenetic information from single gene (nuclear and mitochondrial) or multigene trees were used to impute the missing values for five numerical and two categorical traits. The performances of the methods were evaluated under each missing mechanism by determining the mean squared error and proportion falsely classified rates for numerical and categorical traits, respectively. A random forest method supplemented with a nuclear-derived phylogeny resulted in the lowest error rates for the majority of traits, and this method was used to impute missing values in the original dataset. Data with imputed values better reflected the characteristics and distributions of the original data compared to complete-case data. However, caution should be taken when imputing trait data as phylogeny did not always improve performance for every trait and in every scenario. Ultimately, these results support the use of a real data-driven simulation strategy for selecting a suitable imputation method for a given mixed-type trait dataset.
f
Summary statistics for the complete-case data, original data, and original...
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz (2023). Summary statistics for the complete-case data, original data, and original data with imputed values. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010154.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010154.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS Computational Biology
Authors
Jacqueline A. May; Zeny Feng; Sarah J. Adamowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary statistics for the complete-case data, original data, and original data with imputed values.
f
Statistical details of the SRGC time series dataset.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). Statistical details of the SRGC time series dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0262131.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical details of the SRGC time series dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1

Understanding and Managing Missing Data.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.29265155.v1

Dataset updated

Jun 9, 2025

Dataset provided by

figshare

Authors

Ibrahim Denis Fofanah

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

Clear search

Close search

Google apps

Main menu

Understanding and Managing Missing Data.pdf

MAPE and PB statistics for IBFI compared with other imputation methods...

Data from: A real data-driven simulation strategy to select an imputation...

Summary statistics for the complete-case data, original data, and original...

Statistical details of the SRGC time series dataset.

Understanding and Managing Missing Data.pdf