100+ datasets found

d
Data from: Problems in dealing with missing data and informative censoring...
catalog.data.gov
data.virginia.gov
Updated Sep 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Problems in dealing with missing data and informative censoring in clinical trials [Dataset]. https://catalog.data.gov/dataset/problems-in-dealing-with-missing-data-and-informative-censoring-in-clinical-trials
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
National Institutes of Health
Description
A common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.
Methods for Handling Missing Item Values in Regression Models Using the...
catalog.data.gov
data.virginia.gov
+1more
Updated Sep 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). Methods for Handling Missing Item Values in Regression Models Using the National Survey on Drug Use and Health (NSDUH) [Dataset]. https://catalog.data.gov/dataset/methods-for-handling-missing-item-values-in-regression-models-using-the-national-survey-on
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
The purpose of this report is to guide analysts interested in fitting regression models using data from the National Survey on Drug Use and Health (NSDUH) by providing them with methods for handling missing item values in regression analyses (MIVRA). The report includes a theoretical review of existing MIVRA methods, a simulation study that evaluates several of the more promising methods using existing NSDUH datasets, and a final chapter where the results of both the theoretical review and the simulation study are synthesized into guidance for analysts via decision trees.
Statistical Methods for Missing Data in Large Observational Studies [Methods...
icpsr.umich.edu
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long, Qi (2025). Statistical Methods for Missing Data in Large Observational Studies [Methods Study], Georgia, 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39526.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR39526.v1
Dataset updated
Oct 27, 2025
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Long, Qi
License
https://www.icpsr.umich.edu/web/ICPSR/studies/39526/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39526/terms
Time period covered
2013 - 2018
Area covered
Georgia, United States
Description
Health registries record data about patients with a specific health problem. These data may include age, weight, blood pressure, health problems, medical test results, and treatments received. But data in some patient records may be missing. For example, some patients may not report their weight or all of their health problems. Research studies can use data from health registries to learn how well treatments work. But missing data can lead to incorrect results. To address the problem, researchers often exclude patient records with missing data from their studies. But doing this can also lead to incorrect results. The fewer records that researchers use, the greater the chance for incorrect results. Missing data also lead to another problem: it is harder for researchers to find patient traits that could affect diagnosis and treatment. For example, patients who are overweight may get heart disease. But if data are missing, it is hard for researchers to be sure that trait could affect diagnosis and treatment of heart disease. In this study, the research team developed new statistical methods to fill in missing data in large studies. The team also developed methods to use when data are missing to help find patient traits that could affect diagnosis and treatment. To access the methods, software, and R package, please visit the Long Research Group website.
Z
Missing data in the analysis of multilevel and dependent data (Examples)
data.niaid.nih.gov
Updated Jul 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Grund; Oliver Lüdtke; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Examples) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7773613
Explore at:
Dataset updated
Jul 20, 2023
Dataset provided by
University of Hamburg
IPN - Leibniz Institute for Science and Mathematics Education
Authors
Simon Grund; Oliver Lüdtke; Alexander Robitzsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data sets and computer code for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the computer code (".R") and the data sets from both example analyses (Examples 1 and 2). The data sets are available in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000) x = numeric (Level 1) y = numeric (Level 1) w = binary (Level 2)

In all data sets, missing values are coded as "NA".
d
Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data...
search.dataone.org
dataverse.harvard.edu
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lall, Ranjit; Robinson, Thomas (2023). Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning [Dataset]. http://doi.org/10.7910/DVN/UPL4TT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/UPL4TT
Dataset updated
Nov 23, 2023
Dataset provided by
Harvard Dataverse
Authors
Lall, Ranjit; Robinson, Thomas
Description
Replication and simulation reproduction materials for the article "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." Please see the README file for a summary of the contents and the Replication Guide for a more detailed description. Article abstract: Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.
A dataset from a survey investigating disciplinary differences in data...
zenodo.org
data.niaid.nih.gov
+1more
bin, csv, pdf, txt
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. http://doi.org/10.5281/zenodo.7555363
Explore at:
csv, txt, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7555363
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GENERAL INFORMATION

Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

Date of data collection: January to March 2022

Collection instrument: SurveyMonkey

Funding: Alfred P. Sloan Foundation

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

Links to publications that cite or use the data:

Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266

DATA & FILE OVERVIEW

File List

Filename: MDCDatacitationReuse2021Codebook.pdf
Codebook

Filename: MDCDataCitationReuse2021surveydata.csv
Dataset format in csv

Filename: MDCDataCitationReuse2021surveydata.sav
Dataset format in SPSS

Filename: MDCDataCitationReuseSurvey2021QNR.pdf
Questionnaire

Additional related data collected that was not included in the current data package: Open ended questions asked to respondents

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:

The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

Methods for processing the data:

Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

Instrument- or software-specific information needed to interpret the data:

The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

Number of variables: 94

Number of cases/rows: 2,492

Missing data codes: 999 Not asked

Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
d
New Approach to Evaluating Supplementary Homicide Report (SHR) Data...
catalog.data.gov
icpsr.umich.edu
Updated Nov 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 [Dataset]. https://catalog.data.gov/dataset/new-approach-to-evaluating-supplementary-homicide-report-shr-data-imputation-1990-1995-ff769
Explore at:
Dataset updated
Nov 14, 2025
Dataset provided by
National Institute of Justice
Description
The purpose of the project was to learn more about patterns of homicide in the United States by strengthening the ability to make imputations for Supplementary Homicide Report (SHR) data with missing values. Supplementary Homicide Reports (SHR) and local police data from Chicago, Illinois, St. Louis, Missouri, Philadelphia, Pennsylvania, and Phoenix, Arizona, for 1990 to 1995 were merged to create a master file by linking on overlapping information on victim and incident characteristics. Through this process, 96 percent of the cases in the SHR were matched with cases in the police files. The data contain variables for three types of cases: complete in SHR, missing offender and incident information in SHR but known in police report, and missing offender and incident information in both. The merged file allows estimation of similarities and differences between the cases with known offender characteristics in the SHR and those in the other two categories. The accuracy of existing data imputation methods can be assessed by comparing imputed values in an "incomplete" dataset (the SHR), generated by the three imputation strategies discussed in the literature, with the actual values in a known "complete" dataset (combined SHR and police data). Variables from both the Supplemental Homicide Reports and the additional police report offense data include incident date, victim characteristics, offender characteristics, incident details, geographic information, as well as variables regarding the matching procedure.
Data from: Multiple Imputation for the Supplementary Homicide Reports:...
icpsr.umich.edu
datasets.ai
+3more
Updated Mar 31, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberts, John; Roberts, Aki (2016). Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis [Dataset]. http://doi.org/10.3886/ICPSR36379.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR36379.v1
Dataset updated
Mar 31, 2016
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Roberts, John; Roberts, Aki
License
https://www.icpsr.umich.edu/web/ICPSR/studies/36379/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36379/terms
Time period covered
1990 - 1995
Area covered
Philadelphia, Illinois, United States, Chicago, Arizona, Missouri, Pennsylvania, Phoenix, St. Louis
Description
This study was an evaluation of multiple imputation strategies to address missing data using the New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 (ICPSR 20060) dataset.
Retail Product Dataset with Missing Values
kaggle.com
zip
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
Explore at:
zip(47826 bytes)Available download formats
Dataset updated
Feb 17, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Handling of Missing Data Induced by Time-Varying Covariates in Comparative...
icpsr.umich.edu
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Desai, Manisha (2025). Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39528.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR39528.v1
Dataset updated
Oct 9, 2025
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Desai, Manisha
License
https://www.icpsr.umich.edu/web/ICPSR/studies/39528/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39528/terms
Time period covered
2013 - 2018
Description
Researchers can use data from health registries or electronic health records to compare two or more treatments. Registries store data about patients with a specific health problem. These data include how well those patients respond to treatments and information about patient traits, such as age, weight, or blood pressure. But sometimes data about patient traits are missing. Missing data about patient traits can lead to incorrect study results, especially when traits change over time. For example, weight can change over time, and the patient may not report their weight at some points along the way. Researchers use statistical methods to fill in these missing data. In this study, the research team compared a new statistical method to fill in missing data with traditional methods. Traditional methods remove patients with missing data or fill in each missing number with a single estimate. The new method creates multiple possible estimates to fill in each missing number. To access the methods, software, and R package, please visit the SimulateCER GitHub and SimTimeVar CRAN website.
f
Data_Sheet_1_Three Sample Estimates of Fraction of Missing Information From...
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lihan Chen; Victoria Savalei (2023). Data_Sheet_1_Three Sample Estimates of Fraction of Missing Information From Full Information Maximum Likelihood.PDF [Dataset]. http://doi.org/10.3389/fpsyg.2021.667802.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2021.667802.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Lihan Chen; Victoria Savalei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In missing data analysis, the reporting of missing rates is insufficient for the readers to determine the impact of missing data on the efficiency of parameter estimates. A more diagnostic measure, the fraction of missing information (FMI), shows how the standard errors of parameter estimates increase from the information loss due to ignorable missing data. FMI is well-known in the multiple imputation literature (Rubin, 1987), but it has only been more recently developed for full information maximum likelihood (Savalei and Rhemtulla, 2012). Sample FMI estimates using this approach have since then been made accessible as part of the lavaan package (Rosseel, 2012) in the R statistical programming language. However, the properties of FMI estimates at finite sample sizes have not been the subject of comprehensive investigation. In this paper, we present a simulation study on the properties of three sample FMI estimates from FIML in two common models in psychology, regression and two-factor analysis. We summarize the performance of these FMI estimates and make recommendations on their application.
Additional file 2 of The reporting and handling of missing data in...
springernature.figshare.com
xlsx
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chinenye Okpara; Chidozie Edokwe; George Ioannidis; Alexandra Papaioannou; Jonathan D. Adachi; Lehana Thabane (2024). Additional file 2 of The reporting and handling of missing data in longitudinal studies of older adults is suboptimal: a methodological survey of geriatric journals [Dataset]. http://doi.org/10.6084/m9.figshare.19663862.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19663862.v1
Dataset updated
Feb 15, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chinenye Okpara; Chidozie Edokwe; George Ioannidis; Alexandra Papaioannou; Jonathan D. Adachi; Lehana Thabane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2.
Percentage (%) and number (n) of missing values in the outcome (maximum grip...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Percentage (%) and number (n) of missing values in the outcome (maximum grip strength) among participants that were interviewed, by age group and sex using all available data. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295726.t003
Dataset updated
May 29, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Percentage (%) and number (n) of missing values in the outcome (maximum grip strength) among participants that were interviewed, by age group and sex using all available data.
Public Reporting of Missing Digital Contact Information
catalog.data.gov
data.virginia.gov
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). Public Reporting of Missing Digital Contact Information [Dataset]. https://catalog.data.gov/dataset/public-reporting-of-missing-digital-contact-information-ff7e7
Explore at:
Dataset updated
Oct 7, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
In the May 2020 CMS Interoperability and Patient Access final rule, CMS finalized the policy to publicly report the names and NPIs of those providers who do not have digital contact information included in the NPPES system (85 FR 25584). This data includes the NPI and provider name of providers and clinicians without digital contact information in NPPES.
Additional file 1 of Working with missing data in large-scale assessments
springernature.figshare.com
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francis Huang; Brian Keller (2025). Additional file 1 of Working with missing data in large-scale assessments [Dataset]. http://doi.org/10.6084/m9.figshare.28853685.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28853685.v1
Dataset updated
Apr 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Francis Huang; Brian Keller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary material 1.
T
Indicator Zero or Missing Value Report
mi-treasury.data.socrata.com
csv, xlsx, xml
Updated Mar 24, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Indicator Zero or Missing Value Report [Dataset]. https://mi-treasury.data.socrata.com/dataset/Indicator-Zero-or-Missing-Value-Report/p38z-ybgt
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Mar 24, 2017
Description
Latest database - 1/5/2017
Data from: Benchmarking imputation methods for categorical biological data
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre (2024). Benchmarking imputation methods for categorical biological data [Dataset]. http://doi.org/10.5281/zenodo.10800016
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10800016
Dataset updated
Mar 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 9, 2024
Description
Description:

Welcome to the Zenodo repository for Publication Benchmarking imputation methods for categorical biological data, a comprehensive collection of datasets and scripts utilized in our research endeavors. This repository serves as a vital resource for researchers interested in exploring the empirical and simulated analyses conducted in our study.

Contents:

empirical_analysis:

Trait Dataset of Elasmobranchs: A collection of trait data for elasmobranch species obtained from FishBase , stored as RDS file.

Phylogenetic Tree: A phylogenetic tree stored as a TRE file.

Imputations Replicates (Imputation): Replicated imputations of missing data in the trait dataset, stored as RData files.

Error Calculation (Results): Error calculation results derived from imputed datasets, stored as RData files.

Scripts: Collection of R scripts used for the implementation of empirical analysis.

simulation_analysis:

Input Files: Input files utilized for simulation analyses as CSV files

Data Distribution PDFs: PDF files displaying the distribution of simulated data and the missingness.

Output Files: Simulated trait datasets, trait datasets with missing data, and trait imputed datasets with imputation errors calculated as RData files.

Scripts: Collection of R scripts used for the simulation analysis.

TDIP_package:

Scripts of the TDIP Package: All scripts related to the Trait Data Imputation with Phylogeny (TDIP) R package used in the analyses.

Purpose:

This repository aims to provide transparency and reproducibility to our research findings by making the datasets and scripts publicly accessible. Researchers interested in understanding our methodologies, replicating our analyses, or building upon our work can utilize this repository as a valuable reference.

Citation:

When using the datasets or scripts from this repository, we kindly request citing Publication Benchmarking imputation methods for categorical biological data and acknowledging the use of this Zenodo repository.

Thank you for your interest in our research, and we hope this repository serves as a valuable resource in your scholarly pursuits.
n
Data from: Bias and sensitivity in the placement of fossil taxa resulting...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+2more
zip
Updated Nov 21, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert S. Sansom (2014). Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data [Dataset]. http://doi.org/10.5061/dryad.7tq20
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.7tq20
Dataset updated
Nov 21, 2014
Dataset provided by
University of Manchester
Authors
Robert S. Sansom
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g. appendage, tentacle or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data has not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (2 entries replaced, 6 % of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic frameworks to support their interpretation.
Handling of missing values in python
kaggle.com
zip
Updated Jul 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xodeum (2022). Handling of missing values in python [Dataset]. https://www.kaggle.com/datasets/xodeum/handling-of-missing-values-in-python
Explore at:
zip(2634 bytes)Available download formats
Dataset updated
Jul 3, 2022
Authors
xodeum
Description
In this Datasets i simply showed the handling of missing values in your data with help of python libraries such as NumPy and pandas. You can also see the use of Nan and Non values. Detecting, dropping and filling of null values.
f
Data from: Additional file 2 of Accommodating heterogeneous missing data...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Jul 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Herkommer, Kathleen; Leach, Robin J.; Freedland, Stephen J.; Cooperberg, Matthew R.; Ankerst, Donna P.; Poyet, Cedric; Meissner, Valentin H.; De Hoedt, Amanda M.; Vickers, Andrew J.; Guerrios-Rivera, Lourdes; Neumair, Matthias; Haese, Alexander; Saba, Karim; Boorjian, Stephen A.; Kattan, Michael W.; Liss, Michael A. (2022). Additional file 2 of Accommodating heterogeneous missing data patterns for prostate cancer risk prediction [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000323195
Explore at:
Dataset updated
Jul 22, 2022
Authors
Herkommer, Kathleen; Leach, Robin J.; Freedland, Stephen J.; Cooperberg, Matthew R.; Ankerst, Donna P.; Poyet, Cedric; Meissner, Valentin H.; De Hoedt, Amanda M.; Vickers, Andrew J.; Guerrios-Rivera, Lourdes; Neumair, Matthias; Haese, Alexander; Saba, Karim; Boorjian, Stephen A.; Kattan, Michael W.; Liss, Michael A.
Description
Additional file 2: R code for all 1,024 models available at https://riskcalc.org/ExtendedPBCG/ .

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (2025). Problems in dealing with missing data and informative censoring in clinical trials [Dataset]. https://catalog.data.gov/dataset/problems-in-dealing-with-missing-data-and-informative-censoring-in-clinical-trials

Data from: Problems in dealing with missing data and informative censoring in clinical trials

Explore at:

Dataset updated

Sep 7, 2025

Dataset provided by

National Institutes of Health

Description

A common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.

Clear search

Close search

Google apps

Main menu

Data from: Problems in dealing with missing data and informative censoring...

Methods for Handling Missing Item Values in Regression Models Using the...

Statistical Methods for Missing Data in Large Observational Studies [Methods...

Missing data in the analysis of multilevel and dependent data (Examples)

Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data...

A dataset from a survey investigating disciplinary differences in data...

New Approach to Evaluating Supplementary Homicide Report (SHR) Data...

Data from: Multiple Imputation for the Supplementary Homicide Reports:...

Retail Product Dataset with Missing Values

Handling of Missing Data Induced by Time-Varying Covariates in Comparative...

Data_Sheet_1_Three Sample Estimates of Fraction of Missing Information From...

Additional file 2 of The reporting and handling of missing data in...

Percentage (%) and number (n) of missing values in the outcome (maximum grip...

Public Reporting of Missing Digital Contact Information

Additional file 1 of Working with missing data in large-scale assessments

Indicator Zero or Missing Value Report

Data from: Benchmarking imputation methods for categorical biological data

Data from: Bias and sensitivity in the placement of fossil taxa resulting...

Handling of missing values in python

Data from: Additional file 2 of Accommodating heterogeneous missing data...

Data from: Problems in dealing with missing data and informative censoring in clinical trials