100+ datasets found
  1. Understanding and Managing Missing Data.pdf

    • figshare.com
    pdf
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ibrahim Denis Fofanah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

  2. d

    Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lall, Ranjit; Robinson, Thomas (2023). Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning [Dataset]. http://doi.org/10.7910/DVN/UPL4TT
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lall, Ranjit; Robinson, Thomas
    Description

    Replication and simulation reproduction materials for the article "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." Please see the README file for a summary of the contents and the Replication Guide for a more detailed description. Article abstract: Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.

  3. Z

    Missing data in the analysis of multilevel and dependent data (Examples)

    • data.niaid.nih.gov
    Updated Jul 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Grund; Oliver Lüdtke; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Examples) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7773613
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    University of Hamburg
    IPN - Leibniz Institute for Science and Mathematics Education
    Authors
    Simon Grund; Oliver Lüdtke; Alexander Robitzsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data sets and computer code for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the computer code (".R") and the data sets from both example analyses (Examples 1 and 2). The data sets are available in two file formats (binary ".rda" for use in R; plain-text ".dat").

    The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

    ID = group identifier (1-2000) x = numeric (Level 1) y = numeric (Level 1) w = binary (Level 2)

    In all data sets, missing values are coded as "NA".

  4. d

    Data from: Problems in dealing with missing data and informative censoring...

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Problems in dealing with missing data and informative censoring in clinical trials [Dataset]. https://catalog.data.gov/dataset/problems-in-dealing-with-missing-data-and-informative-censoring-in-clinical-trials
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    A common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.

  5. Spaceship Titanic | No missing values

    • kaggle.com
    zip
    Updated Mar 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sardor Abdirayimov (2022). Spaceship Titanic | No missing values [Dataset]. https://www.kaggle.com/datasets/sardorabdirayimov/spaceship-titanic-no-missing-values
    Explore at:
    zip(284931 bytes)Available download formats
    Dataset updated
    Mar 12, 2022
    Authors
    Sardor Abdirayimov
    Description

    Context

    Dataset is final solution for dealing with missing values in the Spaceship Titanic competition. Kaggle Notebook: https://www.kaggle.com/sardorabdirayimov/best-way-of-dealing-with-missing-values-titanic-2/

  6. Methods for Handling Missing Item Values in Regression Models Using the...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Methods for Handling Missing Item Values in Regression Models Using the National Survey on Drug Use and Health (NSDUH) [Dataset]. https://catalog.data.gov/dataset/methods-for-handling-missing-item-values-in-regression-models-using-the-national-survey-on
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    The purpose of this report is to guide analysts interested in fitting regression models using data from the National Survey on Drug Use and Health (NSDUH) by providing them with methods for handling missing item values in regression analyses (MIVRA). The report includes a theoretical review of existing MIVRA methods, a simulation study that evaluates several of the more promising methods using existing NSDUH datasets, and a final chapter where the results of both the theoretical review and the simulation study are synthesized into guidance for analysts via decision trees.

  7. a guide to handle missing values for ML Model

    • kaggle.com
    zip
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feroz Shinwari (2025). a guide to handle missing values for ML Model [Dataset]. https://www.kaggle.com/datasets/ferozshahshinwari/a-guide-to-handle-missing-values-for-ml-model/code
    Explore at:
    zip(36646 bytes)Available download formats
    Dataset updated
    Feb 10, 2025
    Authors
    Feroz Shinwari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Feroz Shinwari

    Released under Apache 2.0

    Contents

  8. n

    Data from: Missing data estimation in morphometrics: how much is too much?

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Dec 5, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Clavel; Gildas Merceron; Gilles Escarguel (2013). Missing data estimation in morphometrics: how much is too much? [Dataset]. http://doi.org/10.5061/dryad.f0b50
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2013
    Dataset provided by
    Centre National de la Recherche Scientifique
    Authors
    Julien Clavel; Gildas Merceron; Gilles Escarguel
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last several years, several empirically determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on several properties of the study dataset and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. Multiple imputation (MI) techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of multiple imputations with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

  9. S

    Deep learning based Missing Data Imputation

    • scidb.cn
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahjabeen Tahir (2024). Deep learning based Missing Data Imputation [Dataset]. http://doi.org/10.57760/sciencedb.16599
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Mahjabeen Tahir
    Description

    The code provided is related to training an autoencoder, evaluating its performance, and using it for imputing missing values in a dataset. Let's break down each part:Training the Autoencoder (train_autoencoder function):This function takes an autoencoder model and the input features as input.It trains the autoencoder using the input features as both input and target output (hence features, features).The autoencoder is trained for a specified number of epochs (epochs) with a given batch size (batch_size).The shuffle=True argument ensures that the data is shuffled before each epoch to prevent the model from memorizing the input order.After training, it returns the trained autoencoder model and the training history.Evaluating the Autoencoder (evaluate_autoencoder function):This function takes a trained autoencoder model and the input features as input.It uses the trained autoencoder to predict the reconstructed features from the input features.It calculates Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) scores between the original and reconstructed features.These metrics provide insights into how well the autoencoder is able to reconstruct the input features.Imputing with the Autoencoder (impute_with_autoencoder function):This function takes a trained autoencoder model and the input features as input.It identifies missing values (e.g., -9999) in the input features.For each row with missing values, it predicts the missing values using the trained autoencoder.It replaces the missing values with the predicted values.The imputed features are returned as output.To reuse this code:Load your dataset and preprocess it as necessary.Build an autoencoder model using the build_autoencoder function.Train the autoencoder using the train_autoencoder function with your input features.Evaluate the performance of the autoencoder using the evaluate_autoencoder function.If your dataset contains missing values, use the impute_with_autoencoder function to impute them with the trained autoencoder.Use the trained autoencoder for any other relevant tasks, such as feature extraction or anomaly detection.

  10. Missing Values Data

    • kaggle.com
    zip
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed F. ElTantawy (2024). Missing Values Data [Dataset]. https://www.kaggle.com/ahmedgt/missing-values-data
    Explore at:
    zip(442 bytes)Available download formats
    Dataset updated
    Aug 23, 2024
    Authors
    Ahmed F. ElTantawy
    License

    https://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses

    Description

    Dataset

    This dataset was created by Ahmed F. ElTantawy

    Released under RAIL (specified in description)

    Contents

  11. Data from: Imputation of Missing Covariates in Randomized Controlled Trials...

    • tandf.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mutamba T. Kayembe; Shahab Jolani; Frans E.S. Tan; Gerard J.P. van Breukelen (2023). Imputation of Missing Covariates in Randomized Controlled Trials with Continuous Outcomes: Simple, Unbiased and Efficient Methods [Dataset]. http://doi.org/10.6084/m9.figshare.18637732.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Mutamba T. Kayembe; Shahab Jolani; Frans E.S. Tan; Gerard J.P. van Breukelen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The literature on dealing with missing covariates in nonrandomized studies advocates the use of sophisticated methods like multiple imputation (MI) and maximum likelihood (ML)-based approaches over simple methods. However, these methods are not necessarily optimal in terms of bias and efficiency of treatment effect estimation in randomized studies, where the covariate of interest (treatment group) is independent of all baseline (pre-randomization) covariates due to randomization. This has been shown in the literature, but only for missingness on a single baseline covariate. Here, we extend the situation to multiple baseline covariates with missingness and evaluate the performance of MI and ML compared with simple alternative methods under various missingness scenarios in RCTs with a quantitative outcome. We first derive asymptotic relative efficiencies of the simple methods under the missing completely at random (MCAR) scenario and then perform a simulation study for non-MCAR scenarios. Finally, a trial on chronic low back pain is used to illustrate the implementation of the methods. The results show that all simple methods give unbiased treatment effect estimation but with increased mean squared residual. It also turns out that mean imputation and the missing-indicator method are most efficient under all covariate missingness scenarios and perform at least as well as MI and LM in each scenario.

  12. Z

    NN5 Daily Dataset (without Missing Values)

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Godahewa, Rakshitha; Bergmeir, Christoph; Webb, Geoff (2021). NN5 Daily Dataset (without Missing Values) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3889739
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Professor at Monash University
    PhD Student at Monash University
    Lecturer at Monash University
    Authors
    Godahewa, Rakshitha; Bergmeir, Christoph; Webb, Geoff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the NN5 forecasting competition. It contains 111 time series from the banking domain. The goal is predicting the daily cash withdrawals from ATMs in UK.

    The original dataset contains missing values. A missing value on a particular day is replaced by the median across all the same days of the week along the whole series.

  13. n

    Data from: Using multiple imputation to estimate missing data in...

    • narcis.nl
    • data.niaid.nih.gov
    • +1more
    Updated Dec 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ellington, E. Hance; Bastille-Rousseau, Guillaume; Austin, Cayla; Landolt, Kristen N.; Pond, Bruce A.; Rees, Erin E.; Robar, Nicholas; Murray, Dennis L. (2014). Data from: Using multiple imputation to estimate missing data in meta-regression [Dataset]. http://doi.org/10.5061/dryad.m2v4m
    Explore at:
    Dataset updated
    Dec 10, 2014
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Ellington, E. Hance; Bastille-Rousseau, Guillaume; Austin, Cayla; Landolt, Kristen N.; Pond, Bruce A.; Rees, Erin E.; Robar, Nicholas; Murray, Dennis L.
    Description
    1. There is a growing need for scientific synthesis in ecology and evolution. In many cases, meta-analytic techniques can be used to complement such synthesis. However, missing data is a serious problem for any synthetic efforts and can compromise the integrity of meta-analyses in these and other disciplines. Currently, the prevalence of missing data in meta-analytic datasets in ecology and the efficacy of different remedies for this problem have not been adequately quantified. 2. We generated meta-analytic datasets based on literature reviews of experimental and observational data and found that missing data were prevalent in meta-analytic ecological datasets. We then tested the performance of complete case removal (a widely used method when data are missing) and multiple imputation (an alternative method for data recovery) and assessed model bias, precision, and multi-model rankings under a variety of simulated conditions using published meta-regression datasets. 3. We found that complete case removal led to biased and imprecise coefficient estimates and yielded poorly specified models. In contrast, multiple imputation provided unbiased parameter estimates with only a small loss in precision. The performance of multiple imputation, however, was dependent on the type of data missing. It performed best when missing values were weighting variables, but performance was mixed when missing values were predictor variables. Multiple imputation performed poorly when imputing raw data which was then used to calculate effect size and the weighting variable. 4. We conclude that complete case removal should not be used in meta-regression, and that multiple imputation has the potential to be an indispensable tool for meta-regression in ecology and evolution. However, we recommend that users assess the performance of multiple imputation by simulating missing data on a subset of their data before implementing it to recover actual missing data.
  14. c

    Bitcoin Dataset without Missing Values - Dataset - CryptoData Hub

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bitcoin Dataset without Missing Values - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/bitcoin-dataset-without-missing-values
    Explore at:
    Dataset updated
    Dec 4, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the potential influencers of the bitcoin price. There are a total of 18 daily time series including hash rate, block size, mining difficulty etc. It also encompasses public opinion in the form of tweets and google searches mentioning the keyword bitcoin. The data is scraped from the interactive web-graphs available at https://bitinfocharts.com. The original dataset contains missing values and they have been replaced by carrying forward the corresponding last seen observations (LOCF method).

  15. f

    Data from: Hybrid imputation of missing values using KNN on MEWMA-based...

    • tandf.figshare.com
    png
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yijun Jiang; Tingting He; Miaomiao Yu; Yong Zhou (2025). Hybrid imputation of missing values using KNN on MEWMA-based adaptive process control [Dataset]. http://doi.org/10.6084/m9.figshare.30675585.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Yijun Jiang; Tingting He; Miaomiao Yu; Yong Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Missing data, a common issue in production processes due to factors like sample contamination and equipment malfunctions, can lead to a decrease in the recognition accuracy of control charts, especially in cases of shifting. To address this, we introduce an online adaptive weighted imputation technique that combines the strengths of K-Nearest Neighbor (KNN) and Exponentially Weighted Moving Average (EWMA) imputations. It utilizes an adaptive weight matrix for weighting both methods and an adaptive covariance matrix to optimize for missing structures. When dealing with data fluctuation, we assign a higher weight to the KNN method for its sensitivity, while the EWMA method is preferred for stationary data. This approach does not require data stacking; thus, the imputation process for missing data is conducted online. Consequently, based on the online Multivariate EWMA (MEWMA) control chart, real-time process monitoring can be achieved. To optimize the use of available information, we also adjust the covariance matrix with a weight matrix to emphasize complete data. The proposed technique outperforms traditional methods in performance monitoring by avoiding false alarms and quickly detecting anomalies during process shifts.

  16. f

    R scripts used for Monte Carlo simulations and data analyses.

    • plos.figshare.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lateef Babatunde Amusa; Twinomurinzi Hossana (2024). R scripts used for Monte Carlo simulations and data analyses. [Dataset]. http://doi.org/10.1371/journal.pone.0297037.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Lateef Babatunde Amusa; Twinomurinzi Hossana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts used for Monte Carlo simulations and data analyses.

  17. f

    Results of model performance in handling missing data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bradley, Alison; Van der Meer, Robert; McKay, Colin J. (2019). Results of model performance in handling missing data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000098498
    Explore at:
    Dataset updated
    Sep 9, 2019
    Authors
    Bradley, Alison; Van der Meer, Robert; McKay, Colin J.
    Description

    Results of model performance in handling missing data.

  18. d

    Replication Data for: A GMM Approach for Dealing with Missing Data on...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald, Stephen; Abrevaya, Jason (2023). Replication Data for: A GMM Approach for Dealing with Missing Data on Regressors [Dataset]. http://doi.org/10.7910/DVN/JMWMWW
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Donald, Stephen; Abrevaya, Jason
    Description

    Replication Data for: A GMM Approach for Dealing with Missing Data on Regressors

  19. Fill The Cell that has Missing Values

    • kaggle.com
    zip
    Updated Feb 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEMİL BAYHAN (2021). Fill The Cell that has Missing Values [Dataset]. https://www.kaggle.com/cemlbayhan/fill-cell-has-missing-values
    Explore at:
    zip(2469 bytes)Available download formats
    Dataset updated
    Feb 28, 2021
    Authors
    CEMİL BAYHAN
    Description

    Dataset

    This dataset was created by CEMİL BAYHAN

    Contents

  20. S

    Prediction of radionuclide diffusion enabled by missing data imputation and...

    • scidb.cn
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun-Lei Tian; Jia-Xing Feng; Jia-Cong Shen; Lei Yao; Jing-Yan Wang; Tao Wu; Yao-Lin Zhao (2025). Prediction of radionuclide diffusion enabled by missing data imputation and ensemble machine learning [Dataset]. http://doi.org/10.57760/sciencedb.j00186.00710
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Jun-Lei Tian; Jia-Xing Feng; Jia-Cong Shen; Lei Yao; Jing-Yan Wang; Tao Wu; Yao-Lin Zhao
    Description

    Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of machine learning models. A regression-based missing data imputation method using light gradient boosting machine algorithm was employed to impute over 60% of the missing data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
Organization logoOrganization logo

Understanding and Managing Missing Data.pdf

Explore at:
pdfAvailable download formats
Dataset updated
Jun 9, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ibrahim Denis Fofanah
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

Search
Clear search
Close search
Google apps
Main menu