100+ datasets found
  1. o

    Identifying Missing Data Handling Methods with Text Mining

    • openicpsr.org
    delimited
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 8, 2023
    Dataset provided by
    Hungarian Academy of Sciences
    Authors
    Krisztián Boros; Zoltán Kmetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2016
    Description

    Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

  2. Retail Product Dataset with Missing Values

    • kaggle.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
    Explore at:
    zip(47826 bytes)Available download formats
    Dataset updated
    Feb 17, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

    The dataset includes:
    - Category (Categorical): Product category (A, B, C, D)
    - Price (Numerical): Randomized product prices
    - Rating (Numerical): Ratings between 1 to 5
    - Stock (Categorical): Availability status (In Stock, Out of Stock)
    - Discount (Numerical): Discount percentage

    This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

  3. d

    Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lall, Ranjit; Robinson, Thomas (2023). Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning [Dataset]. http://doi.org/10.7910/DVN/UPL4TT
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lall, Ranjit; Robinson, Thomas
    Description

    Replication and simulation reproduction materials for the article "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." Please see the README file for a summary of the contents and the Replication Guide for a more detailed description. Article abstract: Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.

  4. f

    DataSheet_1_A Deep Learning Approach for Missing Data Imputation of Rating...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chung-Yuan Cheng; Wan-Ling Tseng; Ching-Fen Chang; Chuan-Hsiung Chang; Susan Shur-Fen Gau (2023). DataSheet_1_A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder.pdf [Dataset]. http://doi.org/10.3389/fpsyt.2020.00673.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Chung-Yuan Cheng; Wan-Ling Tseng; Ching-Fen Chang; Chuan-Hsiung Chang; Susan Shur-Fen Gau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A variety of tools and methods have been used to measure behavioral symptoms of attention-deficit/hyperactivity disorder (ADHD). Missing data is a major concern in ADHD behavioral studies. This study used a deep learning method to impute missing data in ADHD rating scales and evaluated the ability of the imputed dataset (i.e., the imputed data replacing the original missing values) to distinguish youths with ADHD from youths without ADHD. The data were collected from 1220 youths, 799 of whom had an ADHD diagnosis, and 421 were typically developing (TD) youths without ADHD, recruited in Northern Taiwan. Participants were assessed using the Conners’ Continuous Performance Test, the Chinese versions of the Conners’ rating scale-revised: short form for parent and teacher reports, and the Swanson, Nolan, and Pelham, version IV scale for parent and teacher reports. We used deep learning, with information from the original complete dataset (referred to as the reference dataset), to perform missing data imputation and generate an imputation order according to the imputed accuracy of each question. We evaluated the effectiveness of imputation using support vector machine to classify the ADHD and TD groups in the imputed dataset. The imputed dataset can classify ADHD vs. TD up to 89% accuracy, which did not differ from the classification accuracy (89%) using the reference dataset. Most of the behaviors related to oppositional behaviors rated by teachers and hyperactivity/impulsivity rated by both parents and teachers showed high discriminatory accuracy to distinguish ADHD from non-ADHD. Our findings support a deep learning solution for missing data imputation without introducing bias to the data.

  5. Data Cleaning - Feature Imputation

    • kaggle.com
    zip
    Updated Aug 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mr.Machine (2022). Data Cleaning - Feature Imputation [Dataset]. https://www.kaggle.com/datasets/ilayaraja07/data-cleaning-feature-imputation
    Explore at:
    zip(116097 bytes)Available download formats
    Dataset updated
    Aug 13, 2022
    Authors
    Mr.Machine
    Description

    Data Cleaning or Data cleansing is to clean the data by imputing missing values, smoothing noisy data, and identifying or removing outliers. In general, the missing values are found due to collection error or data is corrupted.

    Here some info in details :Feature Engineering - Handling Missing Value

    Wine_Quality.csv dataset have the numerical missing data, and students_Performance.mv.csv dataset have Numerical and categorical missing data's.

  6. ICR - Identifying Age Related Conditions-Filtered

    • kaggle.com
    zip
    Updated May 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Onkur7 (2023). ICR - Identifying Age Related Conditions-Filtered [Dataset]. https://www.kaggle.com/datasets/onkur7/icr-identifying-age-related-conditions-filtered
    Explore at:
    zip(1372977 bytes)Available download formats
    Dataset updated
    May 22, 2023
    Authors
    Onkur7
    Description

    The dataset is created by imputing the missing values of ICR - Identifying Age Related Conditions competition dataset. In this dataset depending on feature selection some subversions are also created. - Version 1 : The version is created by dropping all the rows with missing values. - Version 2 : The version is created by 'BQ' and 'EL' columns which consist most of the missing values. To remove the remaining missing values rows with missing values are deleted. - Version 3 : The version is created by imputing mean values by column average. Median is considered as measure of average. - Version 4 : The version is created by imputing missing values of 'BQ' and 'EL' by linear regression models and remaining missing values are imputed by average value of the column where missing value is present. 'AB', 'AF', 'AH', 'AM', 'CD', 'CF', 'DN', 'FL' and 'GL' are used to calculate the missing values of 'BQ'. 'CU', 'GE' and 'GL' are used to calculate missing values of 'EL'. Models are found in the version4/imputer. Two subversions are created by extraction only important features of the dataset. - Version 5 : The version is created by imputing missing values using KNNImputer. Two subversions are created by extracting only important features. For the categorical feature 'EJ', 'A' is encoded as 0 and 'B' is encoded as '1'. For more details how the transformations of the dataset is done visit this notebook.

  7. Statistical Methods for Missing Data in Large Observational Studies [Methods...

    • icpsr.umich.edu
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long, Qi (2025). Statistical Methods for Missing Data in Large Observational Studies [Methods Study], Georgia, 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39526.v1
    Explore at:
    Dataset updated
    Oct 27, 2025
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Long, Qi
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/39526/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39526/terms

    Time period covered
    2013 - 2018
    Area covered
    Georgia, United States
    Description

    Health registries record data about patients with a specific health problem. These data may include age, weight, blood pressure, health problems, medical test results, and treatments received. But data in some patient records may be missing. For example, some patients may not report their weight or all of their health problems. Research studies can use data from health registries to learn how well treatments work. But missing data can lead to incorrect results. To address the problem, researchers often exclude patient records with missing data from their studies. But doing this can also lead to incorrect results. The fewer records that researchers use, the greater the chance for incorrect results. Missing data also lead to another problem: it is harder for researchers to find patient traits that could affect diagnosis and treatment. For example, patients who are overweight may get heart disease. But if data are missing, it is hard for researchers to be sure that trait could affect diagnosis and treatment of heart disease. In this study, the research team developed new statistical methods to fill in missing data in large studies. The team also developed methods to use when data are missing to help find patient traits that could affect diagnosis and treatment. To access the methods, software, and R package, please visit the Long Research Group website.

  8. Finding_And_Visualizing_Missing_Data_Python

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Finding_And_Visualizing_Missing_Data_Python [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/finding-and-visualizing-missing-data-python
    Explore at:
    zip(371581 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    • This dataset is designed for learning how to identify missing data in Python.
    • It focuses on techniques to detect null, NaN, and incomplete values.
    • It includes examples of visualizing missing data patterns using Python libraries.
    • Useful for beginners practicing data preprocessing and data cleaning.
    • Helps users understand missing data handling methods for machine learning workflows.
    • Supports practical exploration of datasets before model training.

  9. d

    Morpho missing data? 2

    • dune.com
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vegarsti (2025). Morpho missing data? 2 [Dataset]. https://dune.com/discover/content/relevant?resource-type=queries&q=code%3A%22morpho_blue_multichain.morphoblue_evt_supply%22
    Explore at:
    Dataset updated
    Nov 4, 2025
    Authors
    vegarsti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Morpho missing data? 2

  10. Datasheet3_Assessing disparities through missing race and ethnicity data:...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan (2024). Datasheet3_Assessing disparities through missing race and ethnicity data: results from a juvenile arthritis registry.pdf [Dataset]. http://doi.org/10.3389/fped.2024.1430981.s003
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.

  11. f

    A Simple Optimization Workflow to Enable Precise and Accurate Imputation of...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    • +1more
    Updated May 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dabke, Kruttika; Jones, Michelle R.; Kreimer, Simion; Parker, Sarah J. (2021). A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000907442
    Explore at:
    Dataset updated
    May 3, 2021
    Authors
    Dabke, Kruttika; Jones, Michelle R.; Kreimer, Simion; Parker, Sarah J.
    Description

    Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification levelfragment levelimproved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set’s most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

  12. Data from: Benchmarking imputation methods for categorical biological data

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre (2024). Benchmarking imputation methods for categorical biological data [Dataset]. http://doi.org/10.5281/zenodo.10800016
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 9, 2024
    Description

    Description:

    Welcome to the Zenodo repository for Publication Benchmarking imputation methods for categorical biological data, a comprehensive collection of datasets and scripts utilized in our research endeavors. This repository serves as a vital resource for researchers interested in exploring the empirical and simulated analyses conducted in our study.

    Contents:

    1. empirical_analysis:

      • Trait Dataset of Elasmobranchs: A collection of trait data for elasmobranch species obtained from FishBase , stored as RDS file.
      • Phylogenetic Tree: A phylogenetic tree stored as a TRE file.
      • Imputations Replicates (Imputation): Replicated imputations of missing data in the trait dataset, stored as RData files.
      • Error Calculation (Results): Error calculation results derived from imputed datasets, stored as RData files.
      • Scripts: Collection of R scripts used for the implementation of empirical analysis.
    2. simulation_analysis:

      • Input Files: Input files utilized for simulation analyses as CSV files
      • Data Distribution PDFs: PDF files displaying the distribution of simulated data and the missingness.
      • Output Files: Simulated trait datasets, trait datasets with missing data, and trait imputed datasets with imputation errors calculated as RData files.
      • Scripts: Collection of R scripts used for the simulation analysis.
    3. TDIP_package:

      • Scripts of the TDIP Package: All scripts related to the Trait Data Imputation with Phylogeny (TDIP) R package used in the analyses.

    Purpose:

    This repository aims to provide transparency and reproducibility to our research findings by making the datasets and scripts publicly accessible. Researchers interested in understanding our methodologies, replicating our analyses, or building upon our work can utilize this repository as a valuable reference.

    Citation:

    When using the datasets or scripts from this repository, we kindly request citing Publication Benchmarking imputation methods for categorical biological data and acknowledging the use of this Zenodo repository.

    Thank you for your interest in our research, and we hope this repository serves as a valuable resource in your scholarly pursuits.

  13. Imputation missing values in the nominal datasets

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Awsan thabet salem (2023). Imputation missing values in the nominal datasets [Dataset]. https://www.kaggle.com/datasets/awsanthabetsalem/imputation-in-arabic-dataset/data
    Explore at:
    zip(16588335 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    Awsan thabet salem
    Description

    The folder contains three datasets: Zomato restaurants, Restaurants on Yellow Pages, and Arabic poetry. Where all datasets have been taken from Kaggle and made some modifications by adding missing values, where the missing values are referred to as symbol (?). The experiment has been done to experiment with the processes of imputation missing values on nominal values. The missing values in the three datasets are in the range of 10%-80%.

    The Arabic dataset has several modifications as follows: 1. Delete the columns that contain English values such as Id, poem_link, poet link. The reason is the need to evaluate the ERAR method on the Arabic data set. 2. Add diacritical marks to some records to check the effect of diacritical marks during frequent itemset generation. note: the results of the experiment on the Arabic dataset will be find in the paper under the title "Missing values imputation in Arabic datasets using enhanced robust association rules"

  14. Data from: Anomalous values and missing data in clinical and experimental...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Hélio Amante Miot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

  15. d

    Morpho missing data? 1

    • dune.com
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vegarsti (2025). Morpho missing data? 1 [Dataset]. https://dune.com/discover/content/relevant?resource-type=queries&q=code%3A%22morpho_blue_multichain.morphoblue_evt_supply%22
    Explore at:
    Dataset updated
    Nov 4, 2025
    Authors
    vegarsti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Morpho missing data? 1

  16. d

    Fantom.traces Missing Data Example

    • dune.com
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptuschrist (2023). Fantom.traces Missing Data Example [Dataset]. https://dune.com/discover/content/relevant?resource-type=queries&q=code%3A%22fantom.traces%22
    Explore at:
    Dataset updated
    Jun 4, 2023
    Authors
    cryptuschrist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Fantom.traces Missing Data Example

  17. Data from: Evaluating Supplemental Samples in Longitudinal Research:...

    • tandf.figshare.com
    txt
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Laura K. Taylor; Xin Tong; Scott E. Maxwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.

  18. Count of missing data for the SAS datafile

    • figshare.com
    doc
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Béraud (2016). Count of missing data for the SAS datafile [Dataset]. http://doi.org/10.6084/m9.figshare.1466916.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Guillaume Béraud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Missing data counting for the SAS datafile

  19. d

    Replication Data for: Qualitative Imputation of Missing Potential Outcomes

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coppock, Alexander; Kaur, Dipin (2023). Replication Data for: Qualitative Imputation of Missing Potential Outcomes [Dataset]. http://doi.org/10.7910/DVN/2IVKXD
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Coppock, Alexander; Kaur, Dipin
    Description

    We propose a framework for meta-analysis of qualitative causal inferences. We integrate qualitative counterfactual inquiry with an approach from the quantitative causal inference literature called extreme value bounds. Qualitative counterfactual analysis uses the observed outcome and auxiliary information to infer what would have happened had the treatment been set to a different level. Imputing missing potential outcomes is hard and when it fails, we can fill them in under best- and worst-case scenarios. We apply our approach to 63 cases that could have experienced transitional truth commissions upon democratization, 8 of which did. Prior to any analysis, the extreme value bounds around the average treatment effect on authoritarian resumption are 100 percentage points wide; imputation shrinks the width of these bounds to 51 points. We further demonstrate our method by aggregating specialists' beliefs about causal effects gathered through an expert survey, shrinking the width of the bounds to 44 points.

  20. d

    Replication data for: A Unified Approach To Measurement Error And Missing...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blackwell, Matthew; Honaker, James; King, Gary (2023). Replication data for: A Unified Approach To Measurement Error And Missing Data: Overview [Dataset]. http://doi.org/10.7910/DVN/29606
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Blackwell, Matthew; Honaker, James; King, Gary
    Description

    Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b). Notes: This is the first of two articles to appear in the same issue of the same journal by the same authors. The second is “A Unified Approach to Measurement Error and Missing Data: Details and Extensions.” See also: Missing Data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1

Identifying Missing Data Handling Methods with Text Mining

Explore at:
delimitedAvailable download formats
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Jan 1, 1999 - Dec 31, 2016
Description

Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

Search
Clear search
Close search
Google apps
Main menu