100+ datasets found
  1. z

    Missing data in the analysis of multilevel and dependent data (Example data...

    • zenodo.org
    bin
    Updated Jul 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Example data sets) [Dataset]. http://doi.org/10.5281/zenodo.7773614
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Springer
    Authors
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data sets for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the data sets used in both example analyses (Examples 1 and 2) in two file formats (binary ".rda" for use in R; plain-text ".dat").

    The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

    ID = group identifier (1-2000)
    x = numeric (Level 1)
    y = numeric (Level 1)
    w = binary (Level 2)

    In all data sets, missing values are coded as "NA".

  2. Water-quality data imputation with a high percentage of missing values: a...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv
    Updated Jun 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati (2021). Water-quality data imputation with a high percentage of missing values: a machine learning approach [Dataset]. http://doi.org/10.5281/zenodo.4731169
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 8, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries.

    This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges.

    To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)).

    IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases.

    In this dataset, we include the original and imputed values for the following variables:

    • Water temperature (Tw)

    • Dissolved oxygen (DO)

    • Electrical conductivity (EC)

    • pH

    • Turbidity (Turb)

    • Nitrite (NO2-)

    • Nitrate (NO3-)

    • Total Nitrogen (TN)

    Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC].

    More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318.

    If you use this dataset in your work, please cite our paper:
    Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318

  3. u

    Example data simulated using the R package survtd

    • figshare.unimelb.edu.au
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margarita Moreno-Betancur (2023). Example data simulated using the R package survtd [Dataset]. http://doi.org/10.4225/49/58e58a8dc39a6
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    The University of Melbourne
    Authors
    Margarita Moreno-Betancur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This example dataset is used to illustrate the usage of the R package survtd in the Supplementary Materials of the paper:Moreno-Betancur M, Carlin JB, Brilleman SL, Tanamas S, Peeters A, Wolfe R (2017). Survival analysis with time-dependent covariates subject to measurement error and missing data: Two-stage joint model using multiple imputation (submitted).The data was generated using the simjm function of the package, using the following code:dat

  4. Additional file 4 of Heckman imputation models for binary or continuous MNAR...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacques-Emmanuel Galimard; Sylvie Chevret; Emmanuel Curis; Matthieu Resche-Rigon (2023). Additional file 4 of Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors [Dataset]. http://doi.org/10.6084/m9.figshare.7038104.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jacques-Emmanuel Galimard; Sylvie Chevret; Emmanuel Curis; Matthieu Resche-Rigon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code to impute binary outcome. (R 1 kb)

  5. f

    Example of a missing data pattern, where 1 = available, and 0 = missing.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost A. Agelink van Rentergem; Jaap M. J. Murre; Hilde M. Huizenga (2023). Example of a missing data pattern, where 1 = available, and 0 = missing. [Dataset]. http://doi.org/10.1371/journal.pone.0173218.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joost A. Agelink van Rentergem; Jaap M. J. Murre; Hilde M. Huizenga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For each test, and each study, there are scores missing, although all test co-occur at least once.

  6. d

    Slave Routes Datasets, 1650s - 1860s

    • search.dataone.org
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manning, Patrick; Liu, Yu (2023). Slave Routes Datasets, 1650s - 1860s [Dataset]. http://doi.org/10.7910/DVN/6HLXO3
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Manning, Patrick; Liu, Yu
    Time period covered
    Jan 1, 1650 - Jan 1, 1870
    Description

    Estimates of captives carried in the Atlantic slave trade by decade, 1650s to 1860s. Data: routes of voyages and recorded numbers of captives (10 variables and 33,345 cases of slave voyages). Data are organized into 40 routes linking African regions to overseas regions. Purpose: estimation of missing data and totals of captive flows. Method: techniques of Bayesian statistics to estimate missing data on routes and flows of captives. Also included is R-language code for simulating routes and populations

  7. Data from: Missing data estimation in morphometrics: how much is too much?

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    Updated Jun 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel (2022). Data from: Missing data estimation in morphometrics: how much is too much? [Dataset]. http://doi.org/10.5061/dryad.f0b50
    Explore at:
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last several years, several empirically determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on several properties of the study dataset and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. Multiple imputation (MI) techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of multiple imputations with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

  8. r

    Air Quality Monitoring - 2019 (grouped by pollutant)

    • researchdata.edu.au
    • data.qld.gov.au
    • +1more
    Updated Apr 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment, Tourism, Science and Innovation (2020). Air Quality Monitoring - 2019 (grouped by pollutant) [Dataset]. https://researchdata.edu.au/air-quality-monitoring-grouped-pollutant/1459199
    Explore at:
    Dataset updated
    Apr 20, 2020
    Dataset provided by
    data.qld.gov.au
    Authors
    Environment, Tourism, Science and Innovation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annual hourly air quality and meteorological data by pollutant for the 2019 calendar year. For more information on air quality, including live air data, please visit www.qld.gov.au/environment/pollution/monitoring/air. \r \r Data resolution: One-hour average values \r Data row timestamp: Start of averaging period \r Missing data/not monitored: Blank cell \r Sampling height: Four metres above ground (unless otherwise indicated)

  9. f

    A Simple Optimization Workflow to Enable Precise and Accurate Imputation of...

    • acs.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kruttika Dabke; Simion Kreimer; Michelle R. Jones; Sarah J. Parker (2023). A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets [Dataset]. http://doi.org/10.1021/acs.jproteome.1c00070.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kruttika Dabke; Simion Kreimer; Michelle R. Jones; Sarah J. Parker
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification levelfragment levelimproved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set’s most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

  10. Supplemental Material for Evaluating the Effects of Randomness on Missing...

    • osf.io
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert J. Bischoff; Cecilia Padilla-Iglesias; Claudine Gravel-Miguel (2021). Supplemental Material for Evaluating the Effects of Randomness on Missing Data in Archaeological Networks [Dataset]. http://doi.org/10.17605/OSF.IO/J36EB
    Explore at:
    Dataset updated
    Nov 15, 2021
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Robert J. Bischoff; Cecilia Padilla-Iglesias; Claudine Gravel-Miguel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data and R code for the paper "Evaluating the Effects of Randomness on Missing Data in Archaeological Networks"

  11. f

    R scripts used for Monte Carlo simulations and data analyses.

    • plos.figshare.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lateef Babatunde Amusa; Twinomurinzi Hossana (2024). R scripts used for Monte Carlo simulations and data analyses. [Dataset]. http://doi.org/10.1371/journal.pone.0297037.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Lateef Babatunde Amusa; Twinomurinzi Hossana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts used for Monte Carlo simulations and data analyses.

  12. Sensitivity analysis for missing data in cost-effectiveness analysis: Stata...

    • figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baptiste Leurent; Manuel Gomes; Rita Faria; Stephen Morris; Richard Grieve; James R Carpenter (2023). Sensitivity analysis for missing data in cost-effectiveness analysis: Stata code [Dataset]. http://doi.org/10.6084/m9.figshare.6714206.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Baptiste Leurent; Manuel Gomes; Rita Faria; Stephen Morris; Richard Grieve; James R Carpenter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Stata do-files and data to support tutorial "Sensitivity Analysis for Not-at-Random Missing Data in Trial-Based Cost-Effectiveness Analysis" (Leurent, B. et al. PharmacoEconomics (2018) 36: 889).Do-files should be similar to the code provided in the article's supplementary material.Dataset based on 10 Top Tips trial, but modified to preserve confidentiality. Results will differ from those published.

  13. Data from: Benchmarking imputation methods for categorical biological data

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre (2024). Benchmarking imputation methods for categorical biological data [Dataset]. http://doi.org/10.5281/zenodo.10800016
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 9, 2024
    Description

    Description:

    Welcome to the Zenodo repository for Publication Benchmarking imputation methods for categorical biological data, a comprehensive collection of datasets and scripts utilized in our research endeavors. This repository serves as a vital resource for researchers interested in exploring the empirical and simulated analyses conducted in our study.

    Contents:

    1. empirical_analysis:

      • Trait Dataset of Elasmobranchs: A collection of trait data for elasmobranch species obtained from FishBase , stored as RDS file.
      • Phylogenetic Tree: A phylogenetic tree stored as a TRE file.
      • Imputations Replicates (Imputation): Replicated imputations of missing data in the trait dataset, stored as RData files.
      • Error Calculation (Results): Error calculation results derived from imputed datasets, stored as RData files.
      • Scripts: Collection of R scripts used for the implementation of empirical analysis.
    2. simulation_analysis:

      • Input Files: Input files utilized for simulation analyses as CSV files
      • Data Distribution PDFs: PDF files displaying the distribution of simulated data and the missingness.
      • Output Files: Simulated trait datasets, trait datasets with missing data, and trait imputed datasets with imputation errors calculated as RData files.
      • Scripts: Collection of R scripts used for the simulation analysis.
    3. TDIP_package:

      • Scripts of the TDIP Package: All scripts related to the Trait Data Imputation with Phylogeny (TDIP) R package used in the analyses.

    Purpose:

    This repository aims to provide transparency and reproducibility to our research findings by making the datasets and scripts publicly accessible. Researchers interested in understanding our methodologies, replicating our analyses, or building upon our work can utilize this repository as a valuable reference.

    Citation:

    When using the datasets or scripts from this repository, we kindly request citing Publication Benchmarking imputation methods for categorical biological data and acknowledging the use of this Zenodo repository.

    Thank you for your interest in our research, and we hope this repository serves as a valuable resource in your scholarly pursuits.

  14. d

    Data from: Bayesian hierarchical models for spatially misaligned data in R

    • search.dataone.org
    • datadryad.org
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew O. Finley; Sudipto Banerjee; Bruce D. Cook (2025). Bayesian hierarchical models for spatially misaligned data in R [Dataset]. http://doi.org/10.5061/dryad.3g9s2
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Andrew O. Finley; Sudipto Banerjee; Bruce D. Cook
    Time period covered
    Jan 1, 2015
    Description

    Spatial misalignment occurs when at least one of multiple outcome variables is missing at an observed location. For spatial data, prediction of these missing observations should be informed by within location association among outcomes and by proximate locations where measurements were recorded. This study details and illustrates a Bayesian regression framework for modelling spatially misaligned multivariate data. Particular attention is paid to developing valid probability models capable of estimating parameter posterior distributions and propagating uncertainty through to outcomes' predictive distributions at locations where some or all of the outcomes are not observed. Models and associated software are presented for both Gaussian and non-Gaussian outcomes. Model parameter and predictive inference within the proposed framework is illustrated using a synthetic and forest inventory data set. The proposed Markov chain Monte carlo samplers were written in c++ and leverage R's Foreign Lan...

  15. o

    Sensitivity of global terrestrial ecosystems to climate variability: data...

    • ora.ox.ac.uk
    zip
    Updated Jan 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford (2016). Sensitivity of global terrestrial ecosystems to climate variability: data and R code [Dataset]. http://doi.org/10.5287/bodleian:VY2PeyGX4
    Explore at:
    zip(31188451), zip(2143203213), zip(2482430666), zip(2756988208), zip(1932114998), zip(510097482), zip(963288447)Available download formats
    Dataset updated
    Jan 1, 2016
    Dataset provided by
    University of Oxford
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Time period covered
    2000 - 2013
    Area covered
    Global (-180 - 180), (-60 - 90)
    Description

    Data and coding scripts for Seddon et al. (2016) Nature (DOI 10.1038/nature16986). We derived monthly time-series of four key terrestrial ecosystem variables at 0.05 degree (~5km) resolution from observations by the MODIS sensor on Terra (AM) for the period February 2010-December 2013 inclusive, and developed a method to identify vegetation sensitivity to climate variability over this period (see Methods in main paper).

    This ORA item contains all data and files required to run the analysis described in the paper. Data required to run the script are provided in six zip files evi.zip, temp.zip, aetpet.zip, cld.zip, stdev.zip, numpxl.zip, each containing 167 text files, one per month of available data, in addition to a supporting files folder. Details are as follows.

    supporting_files.zip : This directory includes computer code and additional supporting files. Please see the 'read me.txt' file within this directory for more information.

    evi.zip: ENHANCED VEGETATION INDEX (EVI). We used the MOD13C2 product (Huete et al 2002) which comprises monthly, global EVI at 0.05 degree resolution. In some cases where no clear-sky observations are available, the MOD13C2 version 5 product replaces no-data values with climatological monthly means, so we removed these values where appropriate.

    EVI format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get EVI) filenames = yyyymmevi.txt

    numpxl.zip - COUNTS OF THE NUMBER OF PIXELS USED IN EVI CALCULATION. The MOD13C2 product is the result of a spatially and temporally averaged mosaic of higher resolution (1km pixels). Data in this directory represent the number of 1km observations used to calculate the MODIS EVI product. See the online documentation for more details (Solano et al. 2010).

    numpxl format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = counts filenames = yyyy_mm_numpxl_pt05deg.txt

    stdev.zip - STANDARD DEVIATION OF EVI VALUES. Standard deviation of the monthly EVI observations. See discussion in numpxl.zip item (above) and the online documentation for more details (Solano et al. 2010).

    stdev format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get EVI) filenames = yyyy_mm_stdev_pt05deg.txt

    temp.zip: AIR TEMPERATURE. We used the MOD07_L2 Atmospheric Profile product (Seeman et al 2006) as a measure of air temperature. Five-minute swaths of Retrieved Temperature Profile were projected to geographic co-ordinates. Pixels from the highest available pressure level, corresponding to the temperature closest to the Earth's surface, were selected from each swath. Swaths were then mean-mosaicked into global daily grids, and the daily global grids were mean-composited to monthly grids of air temperature.

    Air temperature format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = degrees C scale factor = 1 (divide the value by 1 to get Air temperature) filenames = yyyymmtemp.txt

    aetpet.zip: WATER AVAILABILITY. We used the MOD16 Global Evapotranspiration product (Mu et al 2011) to calculate the monthly 0.05 degree ratio of Actual to Potential Evapotranspiration (AET/PET).

    AET/PET format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get AET/PET) filenames = yyyymmaetpet.txt

    cld.zip - CLOUDINESS. We used the MOD35_L2 Cloud Mask product (Ackerman et al 2010). This product provides daily records on the presence of cloudy vs cloudless skies, and we used this to make an index of the proportion of of cloudy to clear days in a given pixel. After conversion to geographic co-ordinates, five-minute swaths at 1-km resolution were reclassed as clear sky or cloudy, and these daily swaths were mean-mosaicked to global coverages, mean composited from daily to monthly, and mean-aggregated from 1km to 0.05 degree.

    cld format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = percentage of days in the month which were cloudy scale factor = 100 (divide the value by 100 to get percentage cloudy days) filenames = yyyymmcld.txt

    References

    Ackerman, S. et al. (2010) Discriminating clear-sky from cloud with MODIS: Algorithm Theoretical Basis Document (MOD35), Version 6.1. (URL: ttp://modis- atmos.gsfc.nasa.gov/_docs/MOD35_A TBD_Collection6.pdf)

    Huete, A. et al. (2002) Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment 83, 195–213.

    Mu, Q., Zhao, M., Running, S.R. (2011) Improvements to a MODIS global terrestrial evapotranspiration algorithm. Remote Sensing of Environment 115, 1781-1800

    Seeman, S. W., Borbas, E. E., Li, J., Menzel, W. P. & Gumley, L. E. (2006) MODIS Atmospheric Profile Retrieval Algorithm Theoretical Basis Document, Version 6 (URL: http://modis-atmos.gsfc.nasa.gov/_docs/MOD07_atbd_v7_April2011.pdf)

    Solano, R. et al. (2010) MODIS Vegetation Index User’s Guide (MOD13 Series) Version 2.00, May 2010 (Collection 5) (URL: http://vip.arizona.edu/documents/MODIS/MODIS_VI_UsersGuide_01_2012.pdf) Seddon et al. (2016) Nature (DOI 10.1038/nature16986) ABSTRACT: Identification of properties that contribute to the persistence and resilience of ecosystems despite climate change constitutes a research priority of global significance. Here, we present a novel, empirical approach to assess the relative sensitivity of ecosystems to climate variability, one property of resilience that builds on theoretical modelling work recognising that systems closer to critical thresholds respond more sensitively to external perturbations. We develop a new metric, the Vegetation Sensitivity Index (VSI) which identifies areas sensitive to climate variability over the past 14 years. The metric uses time-series data of MODIS derived Enhanced Vegetation Index (EVI) and three climatic variables that drive vegetation productivity (air temperature, water availability and cloudiness). Underlying the analysis is an autoregressive modelling approach used to identify regions with memory effects and reduced response rates to external forcing. We find ecologically sensitive regions with amplified responses to climate variability in the arctic tundra, parts of the boreal forest belt, the tropical rainforest, alpine regions worldwide, steppe and prairie regions of central Asia and North and South America, the Caatinga deciduous forest in eastern South America, and eastern areas of Australia. Our study provides a quantitative methodology for assessing the relative response rate of ecosystems – be they natural or with a strong anthropogenic signature – to environmental variability, which is the first step to address why some regions appear to be more sensitive than others and what impact this has upon the resilience of ecosystem service provision and human wellbeing.

  16. o

    TOBS, vežba 2: Osnovni podaci u R-u i učitavanje podataka iz fajlova

    • explore.openaire.eu
    Updated Mar 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadica Miljković (2021). TOBS, vežba 2: Osnovni podaci u R-u i učitavanje podataka iz fajlova [Dataset]. http://doi.org/10.5281/zenodo.4625586
    Explore at:
    Dataset updated
    Mar 21, 2021
    Authors
    Nadica Miljković
    Description

    Druga vežba za predmet Tehnike obrade biomedicinskih signala na master akademskim studijama na Elektrotehničkom fakultetu Univerziteta u Beogradu. Repozitorijum sadrži i odgovarajuće .txt i .csv datoteke sa podacima koji se koriste za izradu zadataka.

  17. Z

    Data and Code for Publication "Estimating inter-individual Mahalanobis...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rathmann, Hannes (2024). Data and Code for Publication "Estimating inter-individual Mahalanobis distances from mixed incomplete high-dimensional data: Application to human skeletal remains from 3rd to 1st millennia BC Southwest Germany" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7869074
    Explore at:
    Dataset updated
    Jan 30, 2024
    Dataset authored and provided by
    Rathmann, Hannes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code for publication: H. Rathmann, S. Lismann, M. Francken, A. Spatzier, Estimating inter-individual Mahalanobis distances from mixed incomplete high-dimensional data: Application to human skeletal remains from 3rd to 1st millennia BC Southwest Germany. Journal of Archaeological Science 156: 105802. https://doi.org/10.1016/j.jas.2023.105802

    The repository contains:

    “R code for FLEXDIST.txt”: R code for executing FLEXDIST, a tool to estimate inter-individual Mahalanobis-type distances, taking correlations among variables into account, applicable to multiple variable scales (nominal, ordinal, continuous, or any mixture thereof), accommodating missing values, and handling high-dimensional data. Please refer to the latest version of this repository for the most up-to-date R code.

    “data.csv”: Pre-processed dataset comprising 85 dental morphological features collected from 64 archaeological human remains from Final Neolithic to Early Iron Age Southwest Germany used for analysis.

    “complete dataset.xlsx”: Complete dataset comprising 199 dental morphological features collected from 144 archaeological human remains from Final Neolithic to Early Iron Age Southwest Germany.

  18. d

    Data from: Joined at the hip: linked characters and the problem of missing...

    • datadryad.org
    zip
    Updated Apr 16, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew John Smith; Michael V. Rosario; Thomas P. Eiting; Elizabeth R. Dumont (2014). Joined at the hip: linked characters and the problem of missing data in studies of disparity [Dataset]. http://doi.org/10.5061/dryad.d380g
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 16, 2014
    Dataset provided by
    Dryad
    Authors
    Andrew John Smith; Michael V. Rosario; Thomas P. Eiting; Elizabeth R. Dumont
    Time period covered
    Apr 15, 2014
    Description

    Paleontological investigations into morphological diversity, or disparity, are often confronted with large amounts of missing data. We illustrate how missing discrete data effects disparity using a novel simulation for removing data based on parameters from published datasets that contain both extinct and extant taxa. We develop an algorithm that assesses the distribution of missing characters in extinct taxa, and simulates data loss by applying that distribution to extant taxa. We term this technique ‘linkage’. We compare differences in disparity metrics and ordination spaces produced by linkage and random character removal. When we incorporated linkage among characters, disparity metrics declined and ordination spaces shrank at a slower rate with increasing missing data, indicating that correlations among characters govern the sensitivity of disparity analysis. We also present and test a new disparity method that uses the linkage algorithm to correct for the bias caused by missing dat...

  19. f

    MAPE and PB statistics for IBFI compared with other imputation methods...

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR). [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR).

  20. d

    Data from: When policy and psychology meet: mitigating the consequences of...

    • datadryad.org
    zip
    Updated Jun 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Okonofua (2020). When policy and psychology meet: mitigating the consequences of bias in schools [Dataset]. http://doi.org/10.6078/D1VT4T
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2020
    Dataset provided by
    Dryad
    Authors
    Jason Okonofua
    Time period covered
    2020
    Description

    This dataset was collected from K-12 teachers via online surveys (Qualtrics). The statistical analyses were conducted in R-programing.

    In the present research, we tested whether a combination of getting perspective and exposure to relevant incremental theories can mitigate the consequences of bias on discipline decisions. We call this combination of approaches a “Bias-Consequence Alleviation” (BCA) intervention. The present research sought to determine how the following components can be integrated to reduce the process by which bias contributes to racial inequality in discipline decisions: (1) getting a misbehaving student’s perspective, “student-perspective”; (2) belief that others’ personalities can change, “student-growth”; and (3) belief that one’s own ability to sustain positive relationships can change, “relationship-growth.” Can a combination of these three components curb troublemaker-labeling and pattern-prediction responses to a Black student’s misbehavior (Exp...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Example data sets) [Dataset]. http://doi.org/10.5281/zenodo.7773614

Missing data in the analysis of multilevel and dependent data (Example data sets)

Explore at:
binAvailable download formats
Dataset updated
Jul 20, 2023
Dataset provided by
Springer
Authors
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Example data sets for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the data sets used in both example analyses (Examples 1 and 2) in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000)
x = numeric (Level 1)
y = numeric (Level 1)
w = binary (Level 2)

In all data sets, missing values are coded as "NA".

Search
Clear search
Close search
Google apps
Main menu