10 datasets found

Data from: Missing data estimation in morphometrics: how much is too much?
zenodo.org
data.niaid.nih.gov
+2more
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel (2022). Data from: Missing data estimation in morphometrics: how much is too much? [Dataset]. http://doi.org/10.5061/dryad.f0b50
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.f0b50
Dataset updated
Jun 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last several years, several empirically determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on several properties of the study dataset and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. Multiple imputation (MI) techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of multiple imputations with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.
z
Missing data in the analysis of multilevel and dependent data (Examples)
zenodo.org
data.niaid.nih.gov
bin
Updated Jul 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Examples) [Dataset]. http://doi.org/10.5281/zenodo.8168054
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8168054
Dataset updated
Jul 20, 2023
Dataset provided by
Springer
Authors
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data sets and computer code for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the computer code (".R") and the data sets from both example analyses (Examples 1 and 2). The data sets are available in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000)
x = numeric (Level 1)
y = numeric (Level 1)
w = binary (Level 2)

In all data sets, missing values are coded as "NA".
d
Replication Data for: In Search of Quality and Equity: The United Kingdom...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wieczorek, Oliver; Richard Münch (2023). Replication Data for: In Search of Quality and Equity: The United Kingdom and Germany in the Struggle for PISA Scores [Dataset]. https://search.dataone.org/view/sha256%3A41e1876d31cad38296a6f593d48ef07847666963642642290e973775f7d22d9b
Explore at:
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Wieczorek, Oliver; Richard Münch
Area covered
United Kingdom, Germany
Description
This repository contains the R-Code necessary to replicate the results of the multilevel regression in the article titled: In Search of Quality and Equity: The United Kingdom and Germany in the Struggle for PISA Scores, which will be published in the International Journal for Education Research Open.
f
Multiple Imputation by Ordered Monotone Blocks With Application to the...
tandf.figshare.com
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin (2023). Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program [Dataset]. http://doi.org/10.6084/m9.figshare.1067056.v2
Explore at:
application/x-dosexecAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1067056.v2
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Fan Li; Michela Baccini; Fabrizia Mealli; Elizabeth R. Zell; Constantine E. Frangakis; Donald B. Rubin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.
Diallel analysis reveals Mx1-dependent and Mx1-independent effects on...
zenodo.org
data.niaid.nih.gov
application/gzip, csv +1
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul L Maurizio; Paul L Maurizio; Martin T Ferris; Martin T Ferris; Gregory R Keele; Gregory R Keele; Darla R Miller; Darla R Miller; Ginger D Shaw; Ginger D Shaw; Alan C Whitmore; Ande West; Clayton R Morrison; Kelsey E Noll; Kelsey E Noll; Kenneth S Plante; Adam S Cockrell; David W Threadgill; David W Threadgill; Fernando Pardo-Manuel de Villena; Fernando Pardo-Manuel de Villena; Ralph S Baric; Mark T Heise; William Valdar; William Valdar; Alan C Whitmore; Ande West; Clayton R Morrison; Kenneth S Plante; Adam S Cockrell; Ralph S Baric; Mark T Heise (2024). Diallel analysis reveals Mx1-dependent and Mx1-independent effects on response to influenza A virus in mice [Dataset]. http://doi.org/10.5281/zenodo.293015
Explore at:
application/gzip, csv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.293015
Dataset updated
Aug 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paul L Maurizio; Paul L Maurizio; Martin T Ferris; Martin T Ferris; Gregory R Keele; Gregory R Keele; Darla R Miller; Darla R Miller; Ginger D Shaw; Ginger D Shaw; Alan C Whitmore; Ande West; Clayton R Morrison; Kelsey E Noll; Kelsey E Noll; Kenneth S Plante; Adam S Cockrell; David W Threadgill; David W Threadgill; Fernando Pardo-Manuel de Villena; Fernando Pardo-Manuel de Villena; Ralph S Baric; Mark T Heise; William Valdar; William Valdar; Alan C Whitmore; Ande West; Clayton R Morrison; Kenneth S Plante; Adam S Cockrell; Ralph S Baric; Mark T Heise
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and analysis files for diallel analysis of weight loss in 8-12 week old male and female mice (n=1,043), mock treated or infected with influenza A virus (H1N1, PR8) across 4 days post-infection, as well as founder haplotype effect analysis at Mx1 for pre-CC and CC-RIX.
d
Data from: Quantifying the impacts of management and herbicide resistance on...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Goodsell; David Comont; Helen Hicks; James Lambert; Richard Hull; Laura Crook; Paolo Fraccaro; Katharina Reusch; Robert Freckleton; Dylan Childs (2023). Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data [Dataset]. http://doi.org/10.5061/dryad.9cnp5hqn5
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9cnp5hqn5
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Robert Goodsell; David Comont; Helen Hicks; James Lambert; Richard Hull; Laura Crook; Paolo Fraccaro; Katharina Reusch; Robert Freckleton; Dylan Childs
Time period covered
Jan 1, 2022
Description
A key challenge in the management of populations is to quantify the impact of interven-tions in the face of environmental and phenotypic variability. However, accurate estima-tion of the effects of management and environment, in large-scale ecological research is often limited by the expense of data collection, the inherent trade-off between quality and quantity, and missing data. In this paper we develop a novel modelling framework, and demographically informed imputation scheme, to comprehensively account for the uncertainty generated by miss-ing population, management, and herbicide resistance data. Using this framework and a large dataset (178 sites over 3 years) on the densities of a destructive arable weed (Alo-pecurus myosuroides) we investigate the effects of environment, management, and evolved herbicide resistance, on weed population dynamics. In this study we quantify the marginal effects of a suite of common management prac-tices, including cropping, cultivation, and herbici..., Data were collected from a network of UK farms using a density structured survey method outlined in Queensborough 2011.Â , , # Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data

Contained are the datasets and code required to replicate the analyses in Goodsell et al (2023), Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data.

Description of the data and file structure

Data: Contains data required to run all stages in the analysis.

Many files contain the same variable names, important variables have been described in the first object they appear in.

all_imputation_data.rds - The data required to run the imputation scheme, this is an R list containing the following:

$Management - data frame containing missing and observed values for management imputation

FF & FFY: the specific field, and field year.

year: the year.

crop: crop

cult_cat : cultivation category

a_gly: number of autumn (post September 1st) glyphosate applicatio...
H
Survey of Consumer Finances (SCF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Survey of Consumer Finances (SCF) [Dataset]. http://doi.org/10.7910/DVN/FRMKMF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FRMKMF
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
f
R-code of the simulation study and use case investigation.
figshare.com
plos.figshare.com
zip
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanja Bülow; Ralf-Dieter Hilgers; Nicole Heussen (2023). R-code of the simulation study and use case investigation. [Dataset]. http://doi.org/10.1371/journal.pone.0293640.s007
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0293640.s007
Dataset updated
Nov 2, 2023
Dataset provided by
PLOS ONE
Authors
Tanja Bülow; Ralf-Dieter Hilgers; Nicole Heussen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This code can be used to define the functions, to create the datasets, to generate the figures and tables for the simulation study and to generate the results from the use case. (ZIP)
Predictors of the need for psychosocial care: Multivariable logistic...
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabine Fischbeck; Barbara H. Imruck; Maria Blettner; Veronika Weyer; Harald Binder; Sylke R. Zeissig; Katharina Emrich; Peter Friedrich-Mai; Manfred E. Beutel (2023). Predictors of the need for psychosocial care: Multivariable logistic regression based on multiple imputation (10 imputation data sets, Nagelkerkes R2 .41-.44). [Dataset]. http://doi.org/10.1371/journal.pone.0132754.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0132754.t003
Dataset updated
Jun 11, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Sabine Fischbeck; Barbara H. Imruck; Maria Blettner; Veronika Weyer; Harald Binder; Sylke R. Zeissig; Katharina Emrich; Peter Friedrich-Mai; Manfred E. Beutel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multiple Imputation for a set of covariates selected by forward and backward selection (level of selection 5%) in single imputation dataPredictors of the need for psychosocial care: Multivariable logistic regression based on multiple imputation (10 imputation data sets, Nagelkerkes R2 .41-.44).
Unadjusted and adjusted Odds Ratios, p value, and 95% CI of individual...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodolfo Rossi; Stanley Zammit; Katherine S. Button; Marcus R. Munafò; Glyn Lewis; Anthony S. David (2023). Unadjusted and adjusted Odds Ratios, p value, and 95% CI of individual N-Back parameters for participants with PEs—Multiple Imputation. [Dataset]. http://doi.org/10.1371/journal.pone.0153148.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0153148.t005
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rodolfo Rossi; Stanley Zammit; Katherine S. Button; Marcus R. Munafò; Glyn Lewis; Anthony S. David
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Unadjusted and adjusted Odds Ratios, p value, and 95% CI of individual N-Back parameters for participants with PEs—Multiple Imputation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel (2022). Data from: Missing data estimation in morphometrics: how much is too much? [Dataset]. http://doi.org/10.5061/dryad.f0b50

Data from: Missing data estimation in morphometrics: how much is too much?

Explore at:

Unique identifier

https://doi.org/10.5061/dryad.f0b50

Dataset updated

Jun 1, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last several years, several empirically determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on several properties of the study dataset and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. Multiple imputation (MI) techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of multiple imputations with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

Clear search

Close search

Google apps

Main menu

Data from: Missing data estimation in morphometrics: how much is too much?

Missing data in the analysis of multilevel and dependent data (Examples)

Replication Data for: In Search of Quality and Equity: The United Kingdom...

Multiple Imputation by Ordered Monotone Blocks With Application to the...

Diallel analysis reveals Mx1-dependent and Mx1-independent effects on...

Data from: Quantifying the impacts of management and herbicide resistance on...

Description of the data and file structure

Survey of Consumer Finances (SCF)

R-code of the simulation study and use case investigation.

Predictors of the need for psychosocial care: Multivariable logistic...

Unadjusted and adjusted Odds Ratios, p value, and 95% CI of individual...

Data from: Missing data estimation in morphometrics: how much is too much?