56 datasets found

d
Data from: Multiple Imputation for the Supplementary Homicide Reports:...
catalog.data.gov
datasets.ai
+2more
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis [Dataset]. https://catalog.data.gov/dataset/multiple-imputation-for-the-supplementary-homicide-reports-evaluation-in-unique-test-data-
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
St. Louis, Chicago
Description
This study was an evaluation of multiple imputation strategies to address missing data using the New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 (ICPSR 20060) dataset.
H
Replication Data for: Comparative investigation of time series missing data...
dataverse.harvard.edu
Updated Jul 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LEIZHEN ZANG; Feng XIONG (2020). Replication Data for: Comparative investigation of time series missing data imputation in political science: Different methods, different results [Dataset]. http://doi.org/10.7910/DVN/GQHURF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/GQHURF
Dataset updated
Jul 24, 2020
Dataset provided by
Harvard Dataverse
Authors
LEIZHEN ZANG; Feng XIONG
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Missing data is a growing concern in social science research. This paper introduces novel machine-learning methods to explore imputation efficiency and its effect on missing data. The authors used Internet and public service data as the test examples. The empirical results show that the method not only verified the robustness of the positive impact of Internet penetration on the public service, but also further ensured that the machine-learning imputation method was better than random and multiple imputation, greatly improving the model’s explanatory power. The panel data after machine-learning imputation with better continuity in the time trend is feasibly analyzed, which can also be analyzed using the dynamic panel model. The long-term effects of the Internet on public services were found to be significantly stronger than the short-term effects. Finally, some mechanisms in the empirical analysis are discussed.
d
New Approach to Evaluating Supplementary Homicide Report (SHR) Data...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 [Dataset]. https://catalog.data.gov/dataset/new-approach-to-evaluating-supplementary-homicide-report-shr-data-imputation-1990-1995-ff769
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Description
The purpose of the project was to learn more about patterns of homicide in the United States by strengthening the ability to make imputations for Supplementary Homicide Report (SHR) data with missing values. Supplementary Homicide Reports (SHR) and local police data from Chicago, Illinois, St. Louis, Missouri, Philadelphia, Pennsylvania, and Phoenix, Arizona, for 1990 to 1995 were merged to create a master file by linking on overlapping information on victim and incident characteristics. Through this process, 96 percent of the cases in the SHR were matched with cases in the police files. The data contain variables for three types of cases: complete in SHR, missing offender and incident information in SHR but known in police report, and missing offender and incident information in both. The merged file allows estimation of similarities and differences between the cases with known offender characteristics in the SHR and those in the other two categories. The accuracy of existing data imputation methods can be assessed by comparing imputed values in an "incomplete" dataset (the SHR), generated by the three imputation strategies discussed in the literature, with the actual values in a known "complete" dataset (combined SHR and police data). Variables from both the Supplemental Homicide Reports and the additional police report offense data include incident date, victim characteristics, offender characteristics, incident details, geographic information, as well as variables regarding the matching procedure.
Processed Datasets - Imputation in Well Log Data: A Benchmark
zenodo.org
application/gzip
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria (2024). Processed Datasets - Imputation in Well Log Data: A Benchmark [Dataset]. http://doi.org/10.5281/zenodo.10987946
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10987946
Dataset updated
May 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 17, 2024
Description
Imputation of well log data is a common task in the field. However a quick review of the literature reveals a lack of padronization when evaluating methods for the problem. The goal of the benchmark is to introduce a standard evaluation protocol to any imputation method for well log data.

In the proposed benchmark, three public datasets are used:

Geolink: The Geolink Dataset is another public dataset of wells in the Norwegian offshore. The data is provided by the company of the same name, GEOLINK and follows the NOLD 2.0 license.
This dataset contains a total of 223 wells. It also has lithology labels for the wells with a total of 36 lithology classes. [download original]

Taranaki Basin: The Taranaki Basin Dataset is a curated set of wells and a convenient option for experimentation especially due to it is ease of accessibility and use.
This collection, under the CDLA-Sharing-1.0 license, contains well logs extracted from the New Zealand Petroleum & Minerals Online Exploration Database and Petlab.
There are a total of 407 wells, of which 289 are onshore and 118 are offshore exploration and production wells. [download original]

Teapot Dome: The Teapot Dome dataset is provided by the Rocky Mountain Oilfield Testing Center (RMOTC) and the US Department of Energy.
It contains different types of data related to the Teapot Dome oil field, such as 2D and 3D seismic data, well logs, and GIS data. The data is licensed under the Creative Commons 4.0 license.
In total, the dataset has 1,179 wells with available logs. The number of available logs varies across wells. There are only 91 wells with the gamma ray, bulk density, and neutron porosity logs, while only three wells have the complete basic suite. [direct download]

Here you can download all three datasets already preprocessed to be used with our implementation, found here.

File Description:

There are six files for each fold partition for each dataset.

datasetname_fold_k_well_log_metadata_train.json : JSON file with general information of the slices of training partition of the fold k. Contains total number of slices and the number of slices per well.

datasetname_fold_k_well_log_metadata_val.json : JSON file with general information of the slices of validation partition of the fold k. Contains total number of slices and the number of slices per well.

datasetname_fold_k_well_log_slices_train.npy: .npy (numpy) file ready to be loaded with the slices for training of the fold k already processed. When loaded should have shape of (total_slices, 256, number_of_logs)

datasetname_fold_k_well_log_slices_val.npy : .npy (numpy) file ready to be loaded with the slices for validation of the fold k already processed.

datasetname_fold_k_well_log_slices_meta_train.json : JSON file with the slices info for all slices in the training partition of the fold k. For each slice, 7 data points are provided, the last four are discarded (it would contain other information that was not used). The first three are in order the: origin well name, the starting position in that well, and the end position of the slice in that well.

datasetname_fold_k_well_log_slices_meta_val.json : JSON file with the slices info for all slices in the validation partition of the fold k.
f
Multiple Imputation by Ordered Monotone Blocks With Application to the...
tandf.figshare.com
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin (2023). Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program [Dataset]. http://doi.org/10.6084/m9.figshare.1067056.v2
Explore at:
application/x-dosexecAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1067056.v2
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.
o
Data from: Identifying Missing Data Handling Methods with Text Mining
openicpsr.org
delimited
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185961V1
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2016
Description
Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.
f
Supplementary file 3_weIMPUTE: a user-friendly web-based genotype imputation...
frontiersin.figshare.com
docx
Updated Mar 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingliang Li; Zhuo Li; Defu Liu; Qi Li; Xiaodong Hu; Jun Yu; Jian Lin; Chunguang Bi; Guanshi Ye; Helong Yu; You Tang (2025). Supplementary file 3_weIMPUTE: a user-friendly web-based genotype imputation platform.docx [Dataset]. http://doi.org/10.3389/fgene.2025.1532464.s003
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2025.1532464.s003
Dataset updated
Mar 17, 2025
Dataset provided by
Frontiers
Authors
Mingliang Li; Zhuo Li; Defu Liu; Qi Li; Xiaodong Hu; Jun Yu; Jian Lin; Chunguang Bi; Guanshi Ye; Helong Yu; You Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGenotype imputation is a critical preprocessing step in genome-wide association studies (GWAS), enhancing statistical power for detecting associated single nucleotide polymorphisms (SNPs) by increasing marker size.ResultsIn response to the needs of researchers seeking user-friendly graphical tools for imputation without requiring informatics or computer expertise, we have developed weIMPUTE, a web-based imputation graphical user interface (GUI). Unlike existing genotype imputation software, weIMPUTE supports multiple imputation software, including SHAPEIT, Eagle, Minimac4, Beagle, and IMPUTE2, while encompassing the entire workflow, from quality control to data format conversion. This comprehensive platform enables both novices and experienced users to readily perform imputation tasks. For reference genotype data owners, weIMPUTE can be installed on a server or workstation, facilitating web-based imputation services without data sharing.ConclusionweIMPUTE represents a versatile imputation solution for researchers across various fields, offering the flexibility to create personalized imputation servers on different operating systems.
f
Data from: Missing Value Imputation in Relational Data Using Variational...
tandf.figshare.com
txt
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Fontaine; Jian Kang; Ji Zhu (2025). Missing Value Imputation in Relational Data Using Variational Inference [Dataset]. http://doi.org/10.6084/m9.figshare.29184891.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29184891.v2
Dataset updated
Jul 11, 2025
Dataset provided by
Taylor & Francis
Authors
Simon Fontaine; Jian Kang; Ji Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In real-world networks, node attributes are often only partially observed, necessitating imputation to support analysis or enable downstream tasks. However, most existing imputation methods overlook the rich information contained within the connectivity among nodes. This research is inspired by the premise that leveraging all available information should yield improved imputation, provided a sufficient association between attributes and edges. Consequently, we introduce a joint latent space model that produces a low-dimensional representation of the data and simultaneously captures the edge and node attribute information. This model relies on the pooling of information induced by shared latent variables, thus improving the prediction of node attributes and providing a more effective attribute imputation method. Our approach uses variational inference to approximate posterior distributions for these latent variables, resulting in predictive distributions for missing values. Through numerical experiments, conducted on both simulated data and real-world networks, we demonstrate that our proposed method successfully harnesses the joint structure information and significantly improves the imputation of missing attributes, specifically when the observed information is weak. Additional results, implementation details, a Python implementation, and the code reproducing the results are available online. Supplementary materials for this article are available online.
Data from: FINDHAP
catalog.data.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). FINDHAP [Dataset]. https://catalog.data.gov/dataset/findhap-76548
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The findhap.f90 program finds haplotypes and imputes genotypes using multiple chip sets and sequence data. Program and download information can be found at the Animal Improvement Program (AIP) web site: http://aipl.arsusda.gov/software/findhap Downloads Version 4 program, example files, and executable (beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data) Example data files for imputation study presented by VanRaden and Sun at the 2014 World Congress on Genetics Applied to Livestock Production Files include actual pedigree, simulated true genotypes, simulated sequence reads, and imputed genotypes. This example used 500 reference bulls sequenced at 4× with 1% error and containing high-density SNPs; the 250 young bulls used to test imputation had only high-density SNPs. Other examples in the study can be generated by setting other options for programs findhap4, geno2seq, and genosim. Resources in this dataset:Resource Title: FINDHAP. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=494&modecode=80-42-05-30 download page
Data from: Uniform Crime Reports [United States]: Supplementary Homicide...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Justice Statistics (2025). Uniform Crime Reports [United States]: Supplementary Homicide Reports With Multiple Imputation, Cumulative Files 1976-2005 [Dataset]. https://catalog.data.gov/dataset/uniform-crime-reports-united-states-supplementary-homicide-reports-with-multiple-impu-1976-04b47
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Bureau of Justice Statisticshttp://bjs.ojp.gov/
Area covered
United States
Description
These data provide incident-level information on criminal homicides including location, circumstances, and method of offense, as well as demographic characteristics of victims and perpetrators and the relationship between the two. To adjust for unit missingness, a multiple imputation approach and a weighting scheme were adopted, resulting in a fully-imputed SHR cumulative database of criminal homicides for the years 1976-2005. Unlike other versions of the SHR files, these are limited to incidents of murder and non-negligent manslaughter, excluding justifiable homicides, negligent manslaughter and homicides related to the September 11, 2001, terrorist attacks.
Z
Dataset for "Cognitive behavioural therapy self-help intervention...
data.niaid.nih.gov
zenodo.org
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamilton, Alexander (2024). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7104637
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Farrand, Paul
Sanderman, Robbert
Von Essen, Louise
Coumoundouros, Chelsea
Hamilton, Alexander
Woodford, Joanne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology

Summary of study

An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms).

Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old.

The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively.

Sixty-five individuals participated in the survey.

See the published article for full study details.

Description of uploaded files

ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity.

ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale.

ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question.

R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data.

R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.
g
Electronic Dialogue 2000 (ED2K)
search.gesis.org
pollux-fid.de
+1more
Updated Apr 10, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Price, Vincent; Cappella, Joseph (2014). Electronic Dialogue 2000 (ED2K) [Dataset]. http://doi.org/10.4232/1.11884
Explore at:
application/x-stata-dta(20849143), application/x-spss-sav(13454566)Available download formats
Unique identifier
https://doi.org/10.4232/1.11884
Dataset updated
Apr 10, 2014
Dataset provided by
GESIS Data Archive
GESIS search
Authors
Price, Vincent; Cappella, Joseph
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Variables measured
gr1 -, tz1 -, b1a2 -, b1a3 -, b1d7 -, b1f1 -, b1f2 -, b1g1 -, b1m1 -, J1H5r -, and 2477 more
Description
Since the early stages of public opinion research, nonresponse has been identified as an important threat to the degree to which our sample can represent the population we are interested in. Researchers have documented a trend of declining response rate over the years. However, the nonresponse rate becomes a concern only when it introduces error or bias into survey results. One way to estimate nonresponse bias is through imputation. Online panels, which maintain a pool of respondents who are invited to participate in research through electronic means, face unique opportunities as well as challenges with regards to nonresponses and their imputations. Using data from a nation-wide online panel, this paper hypothesizes that nonresponse bias may exist due to the common causes shared between response propensity and opinion placements. After testifying the common causes, imputations are made to estimate the missing values. Lastly, the differences between observed distributions on variables of interest and imputed distributions are made to show the scope of nonresponse biases. This paper finds that nonresponse biases may exist in online panels. First, the theoretical model of nonresponse bias was supported because the commoncause pattern was found in the dataset. In other words, response propensity and opinion items that are of interest appeared to share common causes including mostly demographic variables. Second, imputation analyses show that although most of the differences between imputed and measured opinions do not indicate serious biases, there were few cases in which the differences seemed to be critical. The limitations of this study, especially those of the imputation method, are discussed at the end of this chapter. Suggestions for future research are provided too.
f
Data from: Convergence Properties of a Sequential Regression Multiple...
tandf.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jian Zhu; Trivellore E. Raghunathan (2023). Convergence Properties of a Sequential Regression Multiple Imputation Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1597470.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1597470.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
Jian Zhu; Trivellore E. Raghunathan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A sequential regression or chained equations imputation approach uses a Gibbs sampling-type iterative algorithm that imputes the missing values using a sequence of conditional regression models. It is a flexible approach for handling different types of variables and complex data structures. Many simulation studies have shown that the multiple imputation inferences based on this procedure have desirable repeated sampling properties. However, a theoretical weakness of this approach is that the specification of a set of conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. This article develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequence of regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The sequence of regression models are assumed to be empirically good fit for the data chosen by the imputer based on appropriate model diagnostics. The results are used to develop criteria for the choice of regression models. Supplementary materials for this article are available online.
Uniform Crime Reports [United States]: Supplementary Homicide Reports With...
icpsr.umich.edu
catalog.data.gov
ascii, delimited, sas +2
Updated Feb 24, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fox, James Alan; Swatt, Marc L. (2009). Uniform Crime Reports [United States]: Supplementary Homicide Reports With Multiple Imputation, Cumulative Files 1976-2007 [Dataset]. http://doi.org/10.3886/ICPSR24801.v1
Explore at:
ascii, stata, spss, delimited, sasAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR24801.v1
Dataset updated
Feb 24, 2009
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Fox, James Alan; Swatt, Marc L.
License
https://www.icpsr.umich.edu/web/ICPSR/studies/24801/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/24801/terms
Time period covered
1976 - 2007
Area covered
United States
Description
These data provide incident-level information on criminal homicides including location, circumstances, and method of offense, as well as demographic characteristics of victims and perpetrators and the relationship between the two. To adjust for unit missingness, a multiple imputation approach and a weighting scheme were adopted, resulting in a fully-imputed SHR cumulative database of criminal homicides for the years 1976-2007. Unlike other versions of the SHR files, these are limited to incidents of murder and non-negligent manslaughter, excluding justifiable homicides, negligent manslaughter and homicides related to the September 11, 2001, terrorist attacks.
Online Supplemental Materials for: "Total Error and Variability Measures for...
zenodo.org
data.niaid.nih.gov
bin, pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin M. McKinney; Andrew Green; Lars Vilhuber; Lars Vilhuber; John M. Abowd; John M. Abowd; Kevin M. McKinney; Andrew Green (2024). Online Supplemental Materials for: "Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap" [Dataset]. http://doi.org/10.5281/zenodo.3951670
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3951670
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin M. McKinney; Andrew Green; Lars Vilhuber; Lars Vilhuber; John M. Abowd; John M. Abowd; Kevin M. McKinney; Andrew Green
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This archive contains supplementary materials for the published manuscript.

We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full-quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non-response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.
Common Core of Data: State Nonfiscal Survey, 1995-1996 - Version 1
search.gesis.org
Updated Jan 18, 2006
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Education. National Center for Education Statistics (2006). Common Core of Data: State Nonfiscal Survey, 1995-1996 - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR02450.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR02450.v1
Dataset updated
Jan 18, 2006
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
GESIS search
Authors
United States Department of Education. National Center for Education Statistics
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de434779https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de434779
Description
Abstract (en): The primary purpose of the State Nonfiscal Survey is to provide basic information on public elementary and secondary school students and staff for each of the 50 states, the District of Columbia, and outlying territories (American Samoa, Guam, Puerto Rico, the Virgin Islands, and the Marshall Islands). The database provides the following information on students and staff: general information (name, address, and telephone number of the state education agency), staffing information (number of FTEs on the instructional staff, guidance counselor staff, library staff, support staff, and administrative staff), and student information (membership counts by grade, counts of high school completers, counts of high school completers by racial/ethnic breakouts, and breakouts for dropouts by grade, sex, race). ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. All public elementary and secondary education agencies in the 50 states, the District of Columbia, United States territories (American Samoa, Guam, Puerto Rico, the Virgin Islands, and the Marshall Islands), and Department of Defense schools outside of the United States. 2006-01-18 File DOC2450.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.2006-01-18 File CB2450.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads. (1) Part 2, Imputed Data, is a different version of the data in Part 1, Reported Data. The National Center for Education Statistics (NCES) imputed and adjusted some reported values in order to create a data file (Part 2) that more accurately reflects student and staff counts and improves comparability between states. Imputations are defined as cases where the missing value is not reported at all, indicating that subtotals for the category are under-reported. An imputation by NCES assigns a value to the missing item, and the subtotals containing this item increase by the amount of the imputation. Imputations and adjustments were performed on the 50 states and Washington, DC, only. Since all states and Washington, DC, reported data in this survey, these imputations and adjustments were implemented to correct for item nonresponse only. This process consisted of several stages and steps, and varied as to the nature of the missing data. No adjustments or imputations were made to high school graduates or other high school completer categories, nor were any adjustments or imputations performed on the race/ethnicity data. (2) The Instruction Manual that is included with this data collection also applies to COMMON CORE OF DATA: PUBLIC EDUCATION AGENCY UNIVERSE, 1995-1996 (ICPSR 2468) and COMMON CORE OF DATA: PUBLIC SCHOOL UNIVERSE, 1995-1996 (ICPSR 2470). (3) The codebook, data collection instrument, and instruction manual are provided as two Portable Document Format (PDF) files. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using the Adobe Acrobat Reader (version 3.0 or later). Information on how to obtain a copy of the Acrobat Reader is provided through the ICPSR Website on the Internet.
d
Data from: The Prevalence and Nature of Intra-and Inter-group Violence in an...
datasets.ai
icpsr.umich.edu
+1more
0
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Justice (2024). The Prevalence and Nature of Intra-and Inter-group Violence in an Era of Social and Demographic Change, 2000-2014 [UNITED STATES] [Dataset]. https://datasets.ai/datasets/the-prevalence-and-nature-of-intra-and-inter-group-violence-in-an-era-of-social-and-demogr-1a19e
Explore at:
0Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Department of Justice
Area covered
United States
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study used the National Incident-Based Reporting System (NIBRS) to explore whether changes in the 2000-2010 decade were associated with changes in the prevalence and nature of violence between and among Whites, Blacks, and Hispanics. This study also aimed to construct more accessible NIBRS cross-sectional and longitudinal databases containing race/ethnic-specific measures of violent victimization, offending, and arrest. Researchers used NIBRS extract files to examine the influence of recent social changes on violence for Whites, Blacks, and Hispanics, and used advanced imputation techniques to account for missing values on race/ethnic variables. Data for this study was also drawn from the National Historical Geographic Information System, the Census Gazetteer, and Law Enforcement Officers Killed or Assaulted (LEOKA). The collection includes 1 Stata data file with 614 cases and 159 variables and 2 Stata syntax files.
o
Publicly available GWAS summary statistics, harmonized and imputed to GTEx...
explore.openaire.eu
Updated Jan 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Yanyu Liang; YoSon Park; Kristin Ardlie; Tuuli Lappalainen; Hae Kyung Im (2020). Publicly available GWAS summary statistics, harmonized and imputed to GTEx v8' variant reference [Dataset]. http://doi.org/10.5281/zenodo.3629741
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3629741
Dataset updated
Jan 28, 2020
Authors
Alvaro Numa Barbeira; Rodrigo Bonazzola; Eric R Gamazon; Yanyu Liang; YoSon Park; Kristin Ardlie; Tuuli Lappalainen; Hae Kyung Im
Description
harmonized and imputed GWAS summary statistics * harmonized_imputed_gwas.tar contains 114 publicly available GWAS traits, harmonized and imputed to GTEx v8 reference * gwas_metadata.txt is a table with useful information about each trait, such as: - Tag: trait name (also in the file name) - PUBMED_Paper_Link: PUBMED or publication URL (if available) - Portal: URL to web portal from which data was downloaded - Consortium: GWAS Consortium authoring the data - Sample_Size: number of individuals covered in the study - Population: individuals'ancestry (EUR, EAS, etc) - abbreviation: short name used for figures - new_abbreviation: alternative name for additional figures - Deflation: whether imputed summary statistics exhibited deflation (i.e. association p-values are lower than expected by chance. The summary statistics imputation method is conservative, and in public GWAS with few observed variants (<2M), the distribution of p-values lags towards lower significance spectrums. # Data usage policy When using this data, you must acknowledge the source by citing the publication "Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits" (https://doi.org/10.1101/814350). # Disclaimer The data is provided "as is", and the authors assume no responsibility for errors or omissions. The User assumes the entire risk associated with its use of these data. The authors shall not be held liable for any use or misuse of the data described and/or contained herein. The User bears all responsibility in determining whether these data are fit for the User's intended use. The information contained in these data is not better than the original sources from which they were derived, and both scale and accuracy may vary across the data set. These data may not have the accuracy, resolution, completeness, timeliness, or other characteristics appropriate for applications that potential users of the data may contemplate. The user is responsible to comply with any data usage policy from the original GWAS studies; refer to the list of traits described here to identify their respective Consortia's requirements. THE DATA IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE DATA.
Data from: National Survey of Alcohol, Drug, and Mental Health Problems...
icpsr.umich.edu
datamed.org
Updated Mar 30, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wells, Kenneth B.; Sturm, Roland; Burnam, Audrey (2006). National Survey of Alcohol, Drug, and Mental Health Problems [Healthcare for Communities], 1997-1998 [Dataset]. http://doi.org/10.3886/ICPSR03025.v2
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR03025.v2
Dataset updated
Mar 30, 2006
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Wells, Kenneth B.; Sturm, Roland; Burnam, Audrey
License
https://www.icpsr.umich.edu/web/ICPSR/studies/3025/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3025/terms
Time period covered
1997 - 1998
Area covered
United States
Description
This survey is a component of the Robert Wood Johnson Foundation's Health Tracking Initiative, a program designed to monitor changes within the health care system and their effects on people. Focusing on care and treatment for alcohol, drug, and mental health conditions, the survey reinterviewed respondents to the 1996-1997 CTS Household Survey (COMMUNITY TRACKING STUDY HOUSEHOLD SURVEY, 1996-1997, AND FOLLOWBACK SURVEY, 1997-1998: [UNITED STATES] [ICPSR 2524]). Topics covered by the questionnaire include (1) demographics, (2) health and daily activities, (3) mental health, (4) alcohol and illicit drug use, (5) use of medications, (6) health insurance coverage including coverage for mental health, (7) access, utilization, and quality of behavioral health care, (8) work, income, and wealth, and (9) life difficulties. Five imputed versions of the data are included in the collection for analysis with multiple imputation techniques.
Census 2011 - South Africa
datacatalog.ihsn.org
catalog.ihsn.org
+1more
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics South Africa (2019). Census 2011 - South Africa [Dataset]. https://datacatalog.ihsn.org/catalog/4092
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
Statistics South Africahttp://www.statssa.gov.za/
Time period covered
2011
Area covered
South Africa
Description
Abstract

Censuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation.The census plays an essential role in public administration. The results are used to ensure: • equity in distribution of government services • distributing and allocating government funds among various regions and districts for education and health services • delineating electoral districts at national and local levels, and • measuring the impact of industrial development, to name a few The census also provides the benchmark for all surveys conducted by the national statistical office. Without the sampling frame derived from the census, the national statistical system would face difficulties in providing reliable official statistics for use by government and the public. Census also provides information on small areas and population groups with minimum sampling errors. This is important, for example, in planning the location of a school or clinic. Census information is also invaluable for use in the private sector for activities such as business planning and market analyses. The information is used as a benchmark in research and analysis.

Census 2011 was the third democratic census to be conducted in South Africa. Census 2011 specific objectives included: - To provide statistics on population, demographic, social, economic and housing characteristics; - To provide a base for the selection of a new sampling frame; - To provide data at lowest geographical level; and - To provide a primary base for the mid-year projections.

Geographic coverage

National

Analysis unit

Households, Individuals

Kind of data

Census/enumeration data [cen]

Mode of data collection

Face-to-face [f2f]

Research instrument

About the Questionnaire : Much emphasis has been placed on the need for a population census to help government direct its development programmes, but less has been written about how the census questionnaire is compiled. The main focus of a population and housing census is to take stock and produce a total count of the population without omission or duplication. Another major focus is to be able to provide accurate demographic and socio-economic characteristics pertaining to each individual enumerated. Apart from individuals, the focus is on collecting accurate data on housing characteristics and services.A population and housing census provides data needed to facilitate informed decision-making as far as policy formulation and implementation are concerned, as well as to monitor and evaluate their programmes at the smallest area level possible. It is therefore important that Statistics South Africa collects statistical data that comply with the United Nations recommendations and other relevant stakeholder needs.

The United Nations underscores the following factors in determining the selection of topics to be investigated in population censuses: a) The needs of a broad range of data users in the country; b) Achievement of the maximum degree of international comparability, both within regions and on a worldwide basis; c) The probable willingness and ability of the public to give adequate information on the topics; and d) The total national resources available for conducting a census.

In addition, the UN stipulates that census-takers should avoid collecting information that is no longer required simply because it was traditionally collected in the past, but rather focus on key demographic, social and socio-economic variables.It becomes necessary, therefore, in consultation with a broad range of users of census data, to review periodically the topics traditionally investigated and to re-evaluate the need for the series to which they contribute, particularly in the light of new data needs and alternative data sources that may have become available for investigating topics formerly covered in the population census. It was against this background that Statistics South Africa conducted user consultations in 2008 after the release of some of the Community Survey products. However, some groundwork in relation to core questions recommended by all countries in Africa has been done. In line with users' meetings, the crucial demands of the Millennium Development Goals (MDGs) should also be met. It is also imperative that Stats SA meet the demands of the users that require small area data.

Accuracy of data depends on a well-designed questionnaire that is short and to the point. The interview to complete the questionnaire should not take longer than 18 minutes per household. Accuracy also depends on the diligence of the enumerator and honesty of the respondent.On the other hand, disadvantaged populations, owing to their small numbers, are best covered in the census and not in household sample surveys.Variables such as employment/unemployment, religion, income, and language are more accurately covered in household surveys than in censuses.Users'/stakeholders' input in terms of providing information in the planning phase of the census is crucial in making it a success. However, the information provided should be within the scope of the census.

The Household Questionnaire is divided into the following sections:

Household identification particulars

Individual particulars Section A: Demographics Section B: Migration Section C: General Health and Functioning Section D: Parental Survival and Income Section E: Education Section F: Employment Section G: Fertility (Women 12-50 Years Listed) Section H: Housing, Household Goods and Services and Agricultural Activities Section I: Mortality in the Last 12 Months The Household Questionnaire is available in Afrikaans; English; isiZulu; IsiNdebele; Sepedi; SeSotho; SiSwati;Tshivenda;Xitsonga

The Transient and Tourist Hotel Questionnaire (English) is divided into the following sections:

Name, Age, Gender, Date of Birth, Marital Status, Population Group, Country of birth, Citizenship, Province.

The Questionnaire for Institutions (English) is divided into the following sections:

Particulars of the institution

Availability of piped water for the institution

Main source of water for domestic use

Main type of toilet facility

Type of energy/fuel used for cooking, heating and lighting at the institution

Disposal of refuse or rubbish

Asset ownership (TV, Radio, Landline telephone, Refrigerator, Internet facilities)

List of persons in the institution on census night (name, date of birth, sex, population group, marital status, barcode number)

The Post Enumeration Survey Questionnaire (English)

These questionnaires are provided as external resources.

Cleaning operations

Data editing and validation system The execution of each phase of Census operations introduces some form of errors in Census data. Despite quality assurance methodologies embedded in all the phases; data collection, data capturing (both manual and automated), coding, and editing, a number of errors creep in and distort the collected information. To promote consistency and improve on data quality, editing is a paramount phase in identifying and minimising errors such as invalid values, inconsistent entries or unknown/missing values. The editing process for Census 2011 was based on defined rules (specifications).

The editing of Census 2011 data involved a number of sequential processes: selection of members of the editing team, review of Census 2001 and 2007 Community Survey editing specifications, development of editing specifications for the Census 2011 pre-tests (2009 pilot and 2010 Dress Rehearsal), development of firewall editing specifications and finalisation of specifications for the main Census.

Editing team The Census 2011 editing team was drawn from various divisions of the organisation based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors. Census 2011 editing team was drawn from various divisions of the organization based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors.

The Census 2011 questionnaire was very complex, characterised by many sections, interlinked questions and skipping instructions. Editing of such complex, interlinked data items required application of a combination of editing techniques. Errors relating to structure were resolved using structural query language (SQL) in Oracle dataset. CSPro software was used to resolve content related errors. The strategy used for Census 2011 data editing was implementation of automated error detection and correction with minimal changes. Combinations of logical and dynamic imputation/editing were used. Logical imputations were preferred, and in many cases substantial effort was undertaken to deduce a consistent value based on the rest of the household’s information. To profile the extent of changes in the dataset and assess the effects of imputation, a set of imputation flags are included in the edited dataset. Imputation flags values include the following: 0 no imputation was performed; raw data were preserved 1 Logical editing was performed, raw data were blank 2 logical editing was performed, raw data were not blank 3 hot-deck imputation was performed, raw data were blank 4 hot-deck imputation was performed, raw data were not blank

Data appraisal

Independent monitoring and evaluation of Census field activities Independent monitoring of the Census 2011 field activities was carried out by a team of 31 professionals and 381 Monitoring

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institute of Justice (2025). Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis [Dataset]. https://catalog.data.gov/dataset/multiple-imputation-for-the-supplementary-homicide-reports-evaluation-in-unique-test-data-

Data from: Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis

Explore at:

Dataset updated

Mar 12, 2025

Dataset provided by

National Institute of Justice

Area covered

St. Louis, Chicago

Description

This study was an evaluation of multiple imputation strategies to address missing data using the New Approach to Evaluating Supplementary Homicide Report (SHR) Data Imputation, 1990-1995 (ICPSR 20060) dataset.

Clear search

Close search

Google apps

Main menu

Data from: Multiple Imputation for the Supplementary Homicide Reports:...

Replication Data for: Comparative investigation of time series missing data...

New Approach to Evaluating Supplementary Homicide Report (SHR) Data...

Processed Datasets - Imputation in Well Log Data: A Benchmark

File Description:

Multiple Imputation by Ordered Monotone Blocks With Application to the...

Data from: Identifying Missing Data Handling Methods with Text Mining

Supplementary file 3_weIMPUTE: a user-friendly web-based genotype imputation...

Data from: Missing Value Imputation in Relational Data Using Variational...

Data from: FINDHAP

Data from: Uniform Crime Reports [United States]: Supplementary Homicide...

Dataset for "Cognitive behavioural therapy self-help intervention...

Electronic Dialogue 2000 (ED2K)

Data from: Convergence Properties of a Sequential Regression Multiple...

Uniform Crime Reports [United States]: Supplementary Homicide Reports With...

Online Supplemental Materials for: "Total Error and Variability Measures for...

Common Core of Data: State Nonfiscal Survey, 1995-1996 - Version 1

Data from: The Prevalence and Nature of Intra-and Inter-group Violence in an...

Publicly available GWAS summary statistics, harmonized and imputed to GTEx...

Data from: National Survey of Alcohol, Drug, and Mental Health Problems...

Census 2011 - South Africa

Abstract

Geographic coverage

Analysis unit

Kind of data

Mode of data collection

Research instrument

Cleaning operations

Data appraisal

Data from: Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. LouisSee More Versions

Data from: Multiple Imputation for the Supplementary Homicide Reports: Evaluation in Unique Test Data, 1990-1995, Chicago, Philadelphia, Phoenix and St. Louis