56 datasets found
  1. d

    Data from: Multiple Imputation for the Supplementary Homicide Reports:...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis [Dataset]. https://catalog.data.gov/dataset/multiple-imputation-for-the-supplementary-homicide-reports-evaluation-in-unique-test-data-
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    St. Louis, Chicago
    Description

    This study was an evaluation of multiple imputation strategies to address missing data using the New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 (ICPSR 20060) dataset.

  2. H

    Replication Data for: Comparative investigation of time series missing data...

    • dataverse.harvard.edu
    Updated Jul 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LEIZHEN ZANG; Feng XIONG (2020). Replication Data for: Comparative investigation of time series missing data imputation in political science: Different methods, different results [Dataset]. http://doi.org/10.7910/DVN/GQHURF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 24, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    LEIZHEN ZANG; Feng XIONG
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Missing data is a growing concern in social science research. This paper introduces novel machine-learning methods to explore imputation efficiency and its effect on missing data. The authors used Internet and public service data as the test examples. The empirical results show that the method not only verified the robustness of the positive impact of Internet penetration on the public service, but also further ensured that the machine-learning imputation method was better than random and multiple imputation, greatly improving the model’s explanatory power. The panel data after machine-learning imputation with better continuity in the time trend is feasibly analyzed, which can also be analyzed using the dynamic panel model. The long-term effects of the Internet on public services were found to be significantly stronger than the short-term effects. Finally, some mechanisms in the empirical analysis are discussed.

  3. d

    New Approach to Evaluating Supplementary Homicide Report (SHR) Data...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 [Dataset]. https://catalog.data.gov/dataset/new-approach-to-evaluating-supplementary-homicide-report-shr-data-imputation-1990-1995-ff769
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Description

    The purpose of the project was to learn more about patterns of homicide in the United States by strengthening the ability to make imputations for Supplementary Homicide Report (SHR) data with missing values. Supplementary Homicide Reports (SHR) and local police data from Chicago, Illinois, St. Louis, Missouri, Philadelphia, Pennsylvania, and Phoenix, Arizona, for 1990 to 1995 were merged to create a master file by linking on overlapping information on victim and incident characteristics. Through this process, 96 percent of the cases in the SHR were matched with cases in the police files. The data contain variables for three types of cases: complete in SHR, missing offender and incident information in SHR but known in police report, and missing offender and incident information in both. The merged file allows estimation of similarities and differences between the cases with known offender characteristics in the SHR and those in the other two categories. The accuracy of existing data imputation methods can be assessed by comparing imputed values in an "incomplete" dataset (the SHR), generated by the three imputation strategies discussed in the literature, with the actual values in a known "complete" dataset (combined SHR and police data). Variables from both the Supplemental Homicide Reports and the additional police report offense data include incident date, victim characteristics, offender characteristics, incident details, geographic information, as well as variables regarding the matching procedure.

  4. Processed Datasets - Imputation in Well Log Data: A Benchmark

    • zenodo.org
    application/gzip
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria (2024). Processed Datasets - Imputation in Well Log Data: A Benchmark [Dataset]. http://doi.org/10.5281/zenodo.10987946
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 17, 2024
    Description

    Imputation of well log data is a common task in the field. However a quick review of the literature reveals a lack of padronization when evaluating methods for the problem. The goal of the benchmark is to introduce a standard evaluation protocol to any imputation method for well log data.

    In the proposed benchmark, three public datasets are used:

    • Geolink: The Geolink Dataset is another public dataset of wells in the Norwegian offshore. The data is provided by the company of the same name, GEOLINK and follows the NOLD 2.0 license.
      This dataset contains a total of 223 wells. It also has lithology labels for the wells with a total of 36 lithology classes. [download original]
    • Taranaki Basin: The Taranaki Basin Dataset is a curated set of wells and a convenient option for experimentation especially due to it is ease of accessibility and use.
      This collection, under the CDLA-Sharing-1.0 license, contains well logs extracted from the New Zealand Petroleum & Minerals Online Exploration Database and Petlab.
      There are a total of 407 wells, of which 289 are onshore and 118 are offshore exploration and production wells. [download original]
    • Teapot Dome: The Teapot Dome dataset is provided by the Rocky Mountain Oilfield Testing Center (RMOTC) and the US Department of Energy.
      It contains different types of data related to the Teapot Dome oil field, such as 2D and 3D seismic data, well logs, and GIS data. The data is licensed under the Creative Commons 4.0 license.
      In total, the dataset has 1,179 wells with available logs. The number of available logs varies across wells. There are only 91 wells with the gamma ray, bulk density, and neutron porosity logs, while only three wells have the complete basic suite. [direct download]

    Here you can download all three datasets already preprocessed to be used with our implementation, found here.

    File Description:

    There are six files for each fold partition for each dataset.

    • datasetname_fold_k_well_log_metadata_train.json : JSON file with general information of the slices of training partition of the fold k. Contains total number of slices and the number of slices per well.
    • datasetname_fold_k_well_log_metadata_val.json : JSON file with general information of the slices of validation partition of the fold k. Contains total number of slices and the number of slices per well.
    • datasetname_fold_k_well_log_slices_train.npy: .npy (numpy) file ready to be loaded with the slices for training of the fold k already processed. When loaded should have shape of (total_slices, 256, number_of_logs)
    • datasetname_fold_k_well_log_slices_val.npy : .npy (numpy) file ready to be loaded with the slices for validation of the fold k already processed.
    • datasetname_fold_k_well_log_slices_meta_train.json : JSON file with the slices info for all slices in the training partition of the fold k. For each slice, 7 data points are provided, the last four are discarded (it would contain other information that was not used). The first three are in order the: origin well name, the starting position in that well, and the end position of the slice in that well.
    • datasetname_fold_k_well_log_slices_meta_val.json : JSON file with the slices info for all slices in the validation partition of the fold k.
  5. f

    Multiple Imputation by Ordered Monotone Blocks With Application to the...

    • tandf.figshare.com
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin (2023). Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program [Dataset]. http://doi.org/10.6084/m9.figshare.1067056.v2
    Explore at:
    application/x-dosexecAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.

  6. o

    Data from: Identifying Missing Data Handling Methods with Text Mining

    • openicpsr.org
    delimited
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 8, 2023
    Dataset provided by
    Hungarian Academy of Sciences
    Authors
    Krisztián Boros; Zoltán Kmetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2016
    Description

    Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

  7. f

    Supplementary file 3_weIMPUTE: a user-friendly web-based genotype imputation...

    • frontiersin.figshare.com
    docx
    Updated Mar 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingliang Li; Zhuo Li; Defu Liu; Qi Li; Xiaodong Hu; Jun Yu; Jian Lin; Chunguang Bi; Guanshi Ye; Helong Yu; You Tang (2025). Supplementary file 3_weIMPUTE: a user-friendly web-based genotype imputation platform.docx [Dataset]. http://doi.org/10.3389/fgene.2025.1532464.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Frontiers
    Authors
    Mingliang Li; Zhuo Li; Defu Liu; Qi Li; Xiaodong Hu; Jun Yu; Jian Lin; Chunguang Bi; Guanshi Ye; Helong Yu; You Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundGenotype imputation is a critical preprocessing step in genome-wide association studies (GWAS), enhancing statistical power for detecting associated single nucleotide polymorphisms (SNPs) by increasing marker size.ResultsIn response to the needs of researchers seeking user-friendly graphical tools for imputation without requiring informatics or computer expertise, we have developed weIMPUTE, a web-based imputation graphical user interface (GUI). Unlike existing genotype imputation software, weIMPUTE supports multiple imputation software, including SHAPEIT, Eagle, Minimac4, Beagle, and IMPUTE2, while encompassing the entire workflow, from quality control to data format conversion. This comprehensive platform enables both novices and experienced users to readily perform imputation tasks. For reference genotype data owners, weIMPUTE can be installed on a server or workstation, facilitating web-based imputation services without data sharing.ConclusionweIMPUTE represents a versatile imputation solution for researchers across various fields, offering the flexibility to create personalized imputation servers on different operating systems.

  8. f

    Data from: Missing Value Imputation in Relational Data Using Variational...

    • tandf.figshare.com
    txt
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Fontaine; Jian Kang; Ji Zhu (2025). Missing Value Imputation in Relational Data Using Variational Inference [Dataset]. http://doi.org/10.6084/m9.figshare.29184891.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Simon Fontaine; Jian Kang; Ji Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In real-world networks, node attributes are often only partially observed, necessitating imputation to support analysis or enable downstream tasks. However, most existing imputation methods overlook the rich information contained within the connectivity among nodes. This research is inspired by the premise that leveraging all available information should yield improved imputation, provided a sufficient association between attributes and edges. Consequently, we introduce a joint latent space model that produces a low-dimensional representation of the data and simultaneously captures the edge and node attribute information. This model relies on the pooling of information induced by shared latent variables, thus improving the prediction of node attributes and providing a more effective attribute imputation method. Our approach uses variational inference to approximate posterior distributions for these latent variables, resulting in predictive distributions for missing values. Through numerical experiments, conducted on both simulated data and real-world networks, we demonstrate that our proposed method successfully harnesses the joint structure information and significantly improves the imputation of missing attributes, specifically when the observed information is weak. Additional results, implementation details, a Python implementation, and the code reproducing the results are available online. Supplementary materials for this article are available online.

  9. Data from: FINDHAP

    • catalog.data.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). FINDHAP [Dataset]. https://catalog.data.gov/dataset/findhap-76548
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The findhap.f90 program finds haplotypes and imputes genotypes using multiple chip sets and sequence data. Program and download information can be found at the Animal Improvement Program (AIP) web site: http://aipl.arsusda.gov/software/findhap Downloads Version 4 program, example files, and executable (beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data) Example data files for imputation study presented by VanRaden and Sun at the 2014 World Congress on Genetics Applied to Livestock Production Files include actual pedigree, simulated true genotypes, simulated sequence reads, and imputed genotypes. This example used 500 reference bulls sequenced at 4× with 1% error and containing high-density SNPs; the 250 young bulls used to test imputation had only high-density SNPs. Other examples in the study can be generated by setting other options for programs findhap4, geno2seq, and genosim. Resources in this dataset:Resource Title: FINDHAP. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=494&modecode=80-42-05-30 download page

  10. Data from: Uniform Crime Reports [United States]: Supplementary Homicide...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Justice Statistics (2025). Uniform Crime Reports [United States]: Supplementary Homicide Reports With Multiple Imputation, Cumulative Files 1976-2005 [Dataset]. https://catalog.data.gov/dataset/uniform-crime-reports-united-states-supplementary-homicide-reports-with-multiple-impu-1976-04b47
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Bureau of Justice Statisticshttp://bjs.ojp.gov/
    Area covered
    United States
    Description

    These data provide incident-level information on criminal homicides including location, circumstances, and method of offense, as well as demographic characteristics of victims and perpetrators and the relationship between the two. To adjust for unit missingness, a multiple imputation approach and a weighting scheme were adopted, resulting in a fully-imputed SHR cumulative database of criminal homicides for the years 1976-2005. Unlike other versions of the SHR files, these are limited to incidents of murder and non-negligent manslaughter, excluding justifiable homicides, negligent manslaughter and homicides related to the September 11, 2001, terrorist attacks.

  11. Z

    Dataset for "Cognitive behavioural therapy self-help intervention...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamilton, Alexander (2024). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7104637
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Farrand, Paul
    Sanderman, Robbert
    Von Essen, Louise
    Coumoundouros, Chelsea
    Hamilton, Alexander
    Woodford, Joanne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology

    Summary of study

    An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms).

    Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old.

    The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively.

    Sixty-five individuals participated in the survey.

    See the published article for full study details.

    Description of uploaded files

    1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity.

    2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale.

    3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question.

    4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data.

    5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.

  12. g

    Electronic Dialogue 2000 (ED2K)

    • search.gesis.org
    • pollux-fid.de
    • +1more
    Updated Apr 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Price, Vincent; Cappella, Joseph (2014). Electronic Dialogue 2000 (ED2K) [Dataset]. http://doi.org/10.4232/1.11884
    Explore at:
    application/x-stata-dta(20849143), application/x-spss-sav(13454566)Available download formats
    Dataset updated
    Apr 10, 2014
    Dataset provided by
    GESIS Data Archive
    GESIS search
    Authors
    Price, Vincent; Cappella, Joseph
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Variables measured
    gr1 -, tz1 -, b1a2 -, b1a3 -, b1d7 -, b1f1 -, b1f2 -, b1g1 -, b1m1 -, J1H5r -, and 2477 more
    Description

    Since the early stages of public opinion research, nonresponse has been identified as an important threat to the degree to which our sample can represent the population we are interested in. Researchers have documented a trend of declining response rate over the years. However, the nonresponse rate becomes a concern only when it introduces error or bias into survey results. One way to estimate nonresponse bias is through imputation. Online panels, which maintain a pool of respondents who are invited to participate in research through electronic means, face unique opportunities as well as challenges with regards to nonresponses and their imputations. Using data from a nation-wide online panel, this paper hypothesizes that nonresponse bias may exist due to the common causes shared between response propensity and opinion placements. After testifying the common causes, imputations are made to estimate the missing values. Lastly, the differences between observed distributions on variables of interest and imputed distributions are made to show the scope of nonresponse biases. This paper finds that nonresponse biases may exist in online panels. First, the theoretical model of nonresponse bias was supported because the commoncause pattern was found in the dataset. In other words, response propensity and opinion items that are of interest appeared to share common causes including mostly demographic variables. Second, imputation analyses show that although most of the differences between imputed and measured opinions do not indicate serious biases, there were few cases in which the differences seemed to be critical. The limitations of this study, especially those of the imputation method, are discussed at the end of this chapter. Suggestions for future research are provided too.

  13. f

    Data from: Convergence Properties of a Sequential Regression Multiple...

    • tandf.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Zhu; Trivellore E. Raghunathan (2023). Convergence Properties of a Sequential Regression Multiple Imputation Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1597470.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Jian Zhu; Trivellore E. Raghunathan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A sequential regression or chained equations imputation approach uses a Gibbs sampling-type iterative algorithm that imputes the missing values using a sequence of conditional regression models. It is a flexible approach for handling different types of variables and complex data structures. Many simulation studies have shown that the multiple imputation inferences based on this procedure have desirable repeated sampling properties. However, a theoretical weakness of this approach is that the specification of a set of conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. This article develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequence of regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The sequence of regression models are assumed to be empirically good fit for the data chosen by the imputer based on appropriate model diagnostics. The results are used to develop criteria for the choice of regression models. Supplementary materials for this article are available online.

  14. Uniform Crime Reports [United States]: Supplementary Homicide Reports With...

    • icpsr.umich.edu
    • catalog.data.gov
    ascii, delimited, sas +2
    Updated Feb 24, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fox, James Alan; Swatt, Marc L. (2009). Uniform Crime Reports [United States]: Supplementary Homicide Reports With Multiple Imputation, Cumulative Files 1976-2007 [Dataset]. http://doi.org/10.3886/ICPSR24801.v1
    Explore at:
    ascii, stata, spss, delimited, sasAvailable download formats
    Dataset updated
    Feb 24, 2009
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Fox, James Alan; Swatt, Marc L.
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/24801/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/24801/terms

    Time period covered
    1976 - 2007
    Area covered
    United States
    Description

    These data provide incident-level information on criminal homicides including location, circumstances, and method of offense, as well as demographic characteristics of victims and perpetrators and the relationship between the two. To adjust for unit missingness, a multiple imputation approach and a weighting scheme were adopted, resulting in a fully-imputed SHR cumulative database of criminal homicides for the years 1976-2007. Unlike other versions of the SHR files, these are limited to incidents of murder and non-negligent manslaughter, excluding justifiable homicides, negligent manslaughter and homicides related to the September 11, 2001, terrorist attacks.

  15. Online Supplemental Materials for: "Total Error and Variability Measures for...

    • zenodo.org
    • data.niaid.nih.gov
    bin, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin M. McKinney; Andrew Green; Lars Vilhuber; Lars Vilhuber; John M. Abowd; John M. Abowd; Kevin M. McKinney; Andrew Green (2024). Online Supplemental Materials for: "Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap" [Dataset]. http://doi.org/10.5281/zenodo.3951670
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kevin M. McKinney; Andrew Green; Lars Vilhuber; Lars Vilhuber; John M. Abowd; John M. Abowd; Kevin M. McKinney; Andrew Green
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This archive contains supplementary materials for the published manuscript.

    We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full-quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non-response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.

  16. Common Core of Data: State Nonfiscal Survey, 1995-1996 - Version 1

    • search.gesis.org
    Updated Jan 18, 2006
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Education. National Center for Education Statistics (2006). Common Core of Data: State Nonfiscal Survey, 1995-1996 - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR02450.v1
    Explore at:
    Dataset updated
    Jan 18, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    GESIS search
    Authors
    United States Department of Education. National Center for Education Statistics
    License

    https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de434779https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de434779

    Description

    Abstract (en): The primary purpose of the State Nonfiscal Survey is to provide basic information on public elementary and secondary school students and staff for each of the 50 states, the District of Columbia, and outlying territories (American Samoa, Guam, Puerto Rico, the Virgin Islands, and the Marshall Islands). The database provides the following information on students and staff: general information (name, address, and telephone number of the state education agency), staffing information (number of FTEs on the instructional staff, guidance counselor staff, library staff, support staff, and administrative staff), and student information (membership counts by grade, counts of high school completers, counts of high school completers by racial/ethnic breakouts, and breakouts for dropouts by grade, sex, race). ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. All public elementary and secondary education agencies in the 50 states, the District of Columbia, United States territories (American Samoa, Guam, Puerto Rico, the Virgin Islands, and the Marshall Islands), and Department of Defense schools outside of the United States. 2006-01-18 File DOC2450.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-01-18 File CB2450.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads. (1) Part 2, Imputed Data, is a different version of the data in Part 1, Reported Data. The National Center for Education Statistics (NCES) imputed and adjusted some reported values in order to create a data file (Part 2) that more accurately reflects student and staff counts and improves comparability between states. Imputations are defined as cases where the missing value is not reported at all, indicating that subtotals for the category are under-reported. An imputation by NCES assigns a value to the missing item, and the subtotals containing this item increase by the amount of the imputation. Imputations and adjustments were performed on the 50 states and Washington, DC, only. Since all states and Washington, DC, reported data in this survey, these imputations and adjustments were implemented to correct for item nonresponse only. This process consisted of several stages and steps, and varied as to the nature of the missing data. No adjustments or imputations were made to high school graduates or other high school completer categories, nor were any adjustments or imputations performed on the race/ethnicity data. (2) The Instruction Manual that is included with this data collection also applies to COMMON CORE OF DATA: PUBLIC EDUCATION AGENCY UNIVERSE, 1995-1996 (ICPSR 2468) and COMMON CORE OF DATA: PUBLIC SCHOOL UNIVERSE, 1995-1996 (ICPSR 2470). (3) The codebook, data collection instrument, and instruction manual are provided as two Portable Document Format (PDF) files. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using the Adobe Acrobat Reader (version 3.0 or later). Information on how to obtain a copy of the Acrobat Reader is provided through the ICPSR Website on the Internet.

  17. d

    Data from: The Prevalence and Nature of Intra-and Inter-group Violence in an...

    • datasets.ai
    • icpsr.umich.edu
    • +1more
    0
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Justice (2024). The Prevalence and Nature of Intra-and Inter-group Violence in an Era of Social and Demographic Change, 2000-2014 [UNITED STATES] [Dataset]. https://datasets.ai/datasets/the-prevalence-and-nature-of-intra-and-inter-group-violence-in-an-era-of-social-and-demogr-1a19e
    Explore at:
    0Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    Department of Justice
    Area covered
    United States
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study used the National Incident-Based Reporting System (NIBRS) to explore whether changes in the 2000-2010 decade were associated with changes in the prevalence and nature of violence between and among Whites, Blacks, and Hispanics. This study also aimed to construct more accessible NIBRS cross-sectional and longitudinal databases containing race/ethnic-specific measures of violent victimization, offending, and arrest. Researchers used NIBRS extract files to examine the influence of recent social changes on violence for Whites, Blacks, and Hispanics, and used advanced imputation techniques to account for missing values on race/ethnic variables. Data for this study was also drawn from the National Historical Geographic Information System, the Census Gazetteer, and Law Enforcement Officers Killed or Assaulted (LEOKA). The collection includes 1 Stata data file with 614 cases and 159 variables and 2 Stata syntax files.

  18. o

    Publicly available GWAS summary statistics, harmonized and imputed to GTEx...

    • explore.openaire.eu
    Updated Jan 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Yanyu Liang; YoSon Park; Kristin Ardlie; Tuuli Lappalainen; Hae Kyung Im (2020). Publicly available GWAS summary statistics, harmonized and imputed to GTEx v8' variant reference [Dataset]. http://doi.org/10.5281/zenodo.3629741
    Explore at:
    Dataset updated
    Jan 28, 2020
    Authors
    Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Yanyu Liang; YoSon Park; Kristin Ardlie; Tuuli Lappalainen; Hae Kyung Im
    Description

    harmonized and imputed GWAS summary statistics * harmonized_imputed_gwas.tar contains 114 publicly available GWAS traits, harmonized and imputed to GTEx v8 reference * gwas_metadata.txt is a table with useful information about each trait, such as: - Tag: trait name (also in the file name) - PUBMED_Paper_Link: PUBMED or publication URL (if available) - Portal: URL to web portal from which data was downloaded - Consortium: GWAS Consortium authoring the data - Sample_Size: number of individuals covered in the study - Population: individuals'ancestry (EUR, EAS, etc) - abbreviation: short name used for figures - new_abbreviation: alternative name for additional figures - Deflation: whether imputed summary statistics exhibited deflation (i.e. association p-values are lower than expected by chance. The summary statistics imputation method is conservative, and in public GWAS with few observed variants (<2M), the distribution of p-values lags towards lower significance spectrums. # Data usage policy When using this data, you must acknowledge the source by citing the publication "Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits" (https://doi.org/10.1101/814350). # Disclaimer The data is provided "as is", and the authors assume no responsibility for errors or omissions. The User assumes the entire risk associated with its use of these data. The authors shall not be held liable for any use or misuse of the data described and/or contained herein. The User bears all responsibility in determining whether these data are fit for the User's intended use. The information contained in these data is not better than the original sources from which they were derived, and both scale and accuracy may vary across the data set. These data may not have the accuracy, resolution, completeness, timeliness, or other characteristics appropriate for applications that potential users of the data may contemplate. The user is responsible to comply with any data usage policy from the original GWAS studies; refer to the list of traits described here to identify their respective Consortia's requirements. THE DATA IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE DATA.

  19. Data from: National Survey of Alcohol, Drug, and Mental Health Problems...

    • icpsr.umich.edu
    • datamed.org
    Updated Mar 30, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wells, Kenneth B.; Sturm, Roland; Burnam, Audrey (2006). National Survey of Alcohol, Drug, and Mental Health Problems [Healthcare for Communities], 1997-1998 [Dataset]. http://doi.org/10.3886/ICPSR03025.v2
    Explore at:
    Dataset updated
    Mar 30, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Wells, Kenneth B.; Sturm, Roland; Burnam, Audrey
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/3025/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3025/terms

    Time period covered
    1997 - 1998
    Area covered
    United States
    Description

    This survey is a component of the Robert Wood Johnson Foundation's Health Tracking Initiative, a program designed to monitor changes within the health care system and their effects on people. Focusing on care and treatment for alcohol, drug, and mental health conditions, the survey reinterviewed respondents to the 1996-1997 CTS Household Survey (COMMUNITY TRACKING STUDY HOUSEHOLD SURVEY, 1996-1997, AND FOLLOWBACK SURVEY, 1997-1998: [UNITED STATES] [ICPSR 2524]). Topics covered by the questionnaire include (1) demographics, (2) health and daily activities, (3) mental health, (4) alcohol and illicit drug use, (5) use of medications, (6) health insurance coverage including coverage for mental health, (7) access, utilization, and quality of behavioral health care, (8) work, income, and wealth, and (9) life difficulties. Five imputed versions of the data are included in the collection for analysis with multiple imputation techniques.

  20. Census 2011 - South Africa

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics South Africa (2019). Census 2011 - South Africa [Dataset]. https://datacatalog.ihsn.org/catalog/4092
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Statistics South Africahttp://www.statssa.gov.za/
    Time period covered
    2011
    Area covered
    South Africa
    Description

    Abstract

    Censuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation.The census plays an essential role in public administration. The results are used to ensure: • equity in distribution of government services • distributing and allocating government funds among various regions and districts for education and health services • delineating electoral districts at national and local levels, and • measuring the impact of industrial development, to name a few The census also provides the benchmark for all surveys conducted by the national statistical office. Without the sampling frame derived from the census, the national statistical system would face difficulties in providing reliable official statistics for use by government and the public. Census also provides information on small areas and population groups with minimum sampling errors. This is important, for example, in planning the location of a school or clinic. Census information is also invaluable for use in the private sector for activities such as business planning and market analyses. The information is used as a benchmark in research and analysis.

    Census 2011 was the third democratic census to be conducted in South Africa. Census 2011 specific objectives included: - To provide statistics on population, demographic, social, economic and housing characteristics; - To provide a base for the selection of a new sampling frame; - To provide data at lowest geographical level; and - To provide a primary base for the mid-year projections.

    Geographic coverage

    National

    Analysis unit

    Households, Individuals

    Kind of data

    Census/enumeration data [cen]

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    About the Questionnaire : Much emphasis has been placed on the need for a population census to help government direct its development programmes, but less has been written about how the census questionnaire is compiled. The main focus of a population and housing census is to take stock and produce a total count of the population without omission or duplication. Another major focus is to be able to provide accurate demographic and socio-economic characteristics pertaining to each individual enumerated. Apart from individuals, the focus is on collecting accurate data on housing characteristics and services.A population and housing census provides data needed to facilitate informed decision-making as far as policy formulation and implementation are concerned, as well as to monitor and evaluate their programmes at the smallest area level possible. It is therefore important that Statistics South Africa collects statistical data that comply with the United Nations recommendations and other relevant stakeholder needs.

    The United Nations underscores the following factors in determining the selection of topics to be investigated in population censuses: a) The needs of a broad range of data users in the country; b) Achievement of the maximum degree of international comparability, both within regions and on a worldwide basis; c) The probable willingness and ability of the public to give adequate information on the topics; and d) The total national resources available for conducting a census.

    In addition, the UN stipulates that census-takers should avoid collecting information that is no longer required simply because it was traditionally collected in the past, but rather focus on key demographic, social and socio-economic variables.It becomes necessary, therefore, in consultation with a broad range of users of census data, to review periodically the topics traditionally investigated and to re-evaluate the need for the series to which they contribute, particularly in the light of new data needs and alternative data sources that may have become available for investigating topics formerly covered in the population census. It was against this background that Statistics South Africa conducted user consultations in 2008 after the release of some of the Community Survey products. However, some groundwork in relation to core questions recommended by all countries in Africa has been done. In line with users' meetings, the crucial demands of the Millennium Development Goals (MDGs) should also be met. It is also imperative that Stats SA meet the demands of the users that require small area data.

    Accuracy of data depends on a well-designed questionnaire that is short and to the point. The interview to complete the questionnaire should not take longer than 18 minutes per household. Accuracy also depends on the diligence of the enumerator and honesty of the respondent.On the other hand, disadvantaged populations, owing to their small numbers, are best covered in the census and not in household sample surveys.Variables such as employment/unemployment, religion, income, and language are more accurately covered in household surveys than in censuses.Users'/stakeholders' input in terms of providing information in the planning phase of the census is crucial in making it a success. However, the information provided should be within the scope of the census.

    1. The Household Questionnaire is divided into the following sections:
    2. Household identification particulars
    3. Individual particulars Section A: Demographics Section B: Migration Section C: General Health and Functioning Section D: Parental Survival and Income Section E: Education Section F: Employment Section G: Fertility (Women 12-50 Years Listed) Section H: Housing, Household Goods and Services and Agricultural Activities Section I: Mortality in the Last 12 Months The Household Questionnaire is available in Afrikaans; English; isiZulu; IsiNdebele; Sepedi; SeSotho; SiSwati;Tshivenda;Xitsonga

    4. The Transient and Tourist Hotel Questionnaire (English) is divided into the following sections:

    5. Name, Age, Gender, Date of Birth, Marital Status, Population Group, Country of birth, Citizenship, Province.

    6. The Questionnaire for Institutions (English) is divided into the following sections:

    7. Particulars of the institution

    8. Availability of piped water for the institution

    9. Main source of water for domestic use

    10. Main type of toilet facility

    11. Type of energy/fuel used for cooking, heating and lighting at the institution

    12. Disposal of refuse or rubbish

    13. Asset ownership (TV, Radio, Landline telephone, Refrigerator, Internet facilities)

    14. List of persons in the institution on census night (name, date of birth, sex, population group, marital status, barcode number)

    15. The Post Enumeration Survey Questionnaire (English)

    These questionnaires are provided as external resources.

    Cleaning operations

    Data editing and validation system The execution of each phase of Census operations introduces some form of errors in Census data. Despite quality assurance methodologies embedded in all the phases; data collection, data capturing (both manual and automated), coding, and editing, a number of errors creep in and distort the collected information. To promote consistency and improve on data quality, editing is a paramount phase in identifying and minimising errors such as invalid values, inconsistent entries or unknown/missing values. The editing process for Census 2011 was based on defined rules (specifications).

    The editing of Census 2011 data involved a number of sequential processes: selection of members of the editing team, review of Census 2001 and 2007 Community Survey editing specifications, development of editing specifications for the Census 2011 pre-tests (2009 pilot and 2010 Dress Rehearsal), development of firewall editing specifications and finalisation of specifications for the main Census.

    Editing team The Census 2011 editing team was drawn from various divisions of the organisation based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors. Census 2011 editing team was drawn from various divisions of the organization based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors.

    The Census 2011 questionnaire was very complex, characterised by many sections, interlinked questions and skipping instructions. Editing of such complex, interlinked data items required application of a combination of editing techniques. Errors relating to structure were resolved using structural query language (SQL) in Oracle dataset. CSPro software was used to resolve content related errors. The strategy used for Census 2011 data editing was implementation of automated error detection and correction with minimal changes. Combinations of logical and dynamic imputation/editing were used. Logical imputations were preferred, and in many cases substantial effort was undertaken to deduce a consistent value based on the rest of the household’s information. To profile the extent of changes in the dataset and assess the effects of imputation, a set of imputation flags are included in the edited dataset. Imputation flags values include the following: 0 no imputation was performed; raw data were preserved 1 Logical editing was performed, raw data were blank 2 logical editing was performed, raw data were not blank 3 hot-deck imputation was performed, raw data were blank 4 hot-deck imputation was performed, raw data were not blank

    Data appraisal

    Independent monitoring and evaluation of Census field activities Independent monitoring of the Census 2011 field activities was carried out by a team of 31 professionals and 381 Monitoring

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Institute of Justice (2025). Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis [Dataset]. https://catalog.data.gov/dataset/multiple-imputation-for-the-supplementary-homicide-reports-evaluation-in-unique-test-data-

Data from: Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis

Related Article
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
St. Louis, Chicago
Description

This study was an evaluation of multiple imputation strategies to address missing data using the New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 (ICPSR 20060) dataset.

Search
Clear search
Close search
Google apps
Main menu