27 datasets found
  1. z

    Missing data in the analysis of multilevel and dependent data (Example data...

    • zenodo.org
    bin
    Updated Jul 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Example data sets) [Dataset]. http://doi.org/10.5281/zenodo.7773614
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Springer
    Authors
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data sets for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the data sets used in both example analyses (Examples 1 and 2) in two file formats (binary ".rda" for use in R; plain-text ".dat").

    The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

    ID = group identifier (1-2000)
    x = numeric (Level 1)
    y = numeric (Level 1)
    w = binary (Level 2)

    In all data sets, missing values are coded as "NA".

  2. r

    Air Quality Monitoring - 2019 (grouped by pollutant)

    • researchdata.edu.au
    • data.qld.gov.au
    • +1more
    Updated Apr 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.qld.gov.au (2020). Air Quality Monitoring - 2019 (grouped by pollutant) [Dataset]. https://researchdata.edu.au/air-quality-monitoring-grouped-pollutant/1459199
    Explore at:
    Dataset updated
    Apr 20, 2020
    Dataset provided by
    data.qld.gov.au
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annual hourly air quality and meteorological data by pollutant for the 2019 calendar year. For more information on air quality, including live air data, please visit www.qld.gov.au/environment/pollution/monitoring/air. \r \r Data resolution: One-hour average values \r Data row timestamp: Start of averaging period \r Missing data/not monitored: Blank cell \r Sampling height: Four metres above ground (unless otherwise indicated)

  3. Z

    Car Parts Dataset (without Missing Values)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Godahewa, Rakshitha (2021). Car Parts Dataset (without Missing Values) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3994910
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Bergmeir, Christoph
    Webb, Geoff
    Godahewa, Rakshitha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 2674 intermittent monthly time series that represent car parts sales from January 1998 to March 2002. It was extracted from R expsmooth package.

    The original dataset contains missing values and they have been replaced by zeros.

  4. Data from: Missing data estimation in morphometrics: how much is too much?

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel (2022). Data from: Missing data estimation in morphometrics: how much is too much? [Dataset]. http://doi.org/10.5061/dryad.f0b50
    Explore at:
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julien Clavel; Gildas Merceron; Gilles Escarguel; Julien Clavel; Gildas Merceron; Gilles Escarguel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last several years, several empirically determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on several properties of the study dataset and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. Multiple imputation (MI) techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of multiple imputations with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

  5. R

    Conjunto de datos de: Water-quality data imputation with a high percentage...

    • redata.anii.org.uy
    tsv
    Updated Mar 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Rodríguez; Marcos Pastorini; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Angela Gorgoglione; Rafael Rodríguez; Marcos Pastorini; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Angela Gorgoglione (2024). Conjunto de datos de: Water-quality data imputation with a high percentage of missing values: a machine learning approach [Dataset]. http://doi.org/10.60895/redata/TNRT8Q
    Explore at:
    tsv(11154), tsv(47413)Available download formats
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Repositorio de datos abiertos de investigación de Uruguay
    Authors
    Rafael Rodríguez; Marcos Pastorini; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Angela Gorgoglione; Rafael Rodríguez; Marcos Pastorini; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Angela Gorgoglione
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    ANII
    Description

    The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries. This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges. To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)). IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases. In this dataset, we include the original and imputed values for the following variables: - Water temperature (Tw) - Dissolved oxygen (DO) - Electrical conductivity (EC) - pH - Turbidity (Turb) - Nitrite (NO2-) - Nitrate (NO3-) - Total Nitrogen (TN) Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC]. More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318. If you use this dataset in your work, please cite our paper: Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318

  6. f

    Data from: pmartR: Quality Control and Statistics for Mass...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

  7. f

    A Simple Optimization Workflow to Enable Precise and Accurate Imputation of...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kruttika Dabke; Simion Kreimer; Michelle R. Jones; Sarah J. Parker (2023). A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets [Dataset]. http://doi.org/10.1021/acs.jproteome.1c00070.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kruttika Dabke; Simion Kreimer; Michelle R. Jones; Sarah J. Parker
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification levelfragment levelimproved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set’s most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

  8. r

    MOCK Big Five Inventory dataset (German metadata demo)

    • cran.r-project.org
    • rubenarslan.github.io
    • +4more
    Updated Jun 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruben Arslan (2016). MOCK Big Five Inventory dataset (German metadata demo) [Dataset]. http://doi.org/10.5281/zenodo.1326520
    Explore at:
    Dataset updated
    Jun 1, 2016
    Dataset provided by
    MPI Human Development, Berlin
    Authors
    Ruben Arslan
    Time period covered
    2016
    Area covered
    Germany, Goettingen
    Variables measured
    age, ended, created, expired, session, modified, BFIK_agree, BFIK_neuro, BFIK_agree_2, BFIK_neuro_3, and 5 more
    Description

    a small mock Big Five Inventory dataset

    Table of variables

    This table contains variable names, labels, and number of missing values. See the complete codebook for more.

    namelabeln_missing
    sessionNA0
    createduser first opened survey0
    modifieduser last edited survey0
    endeduser finished survey0
    expiredNA28
    BFIK_agree_4RIch kann mich schroff und abweisend anderen gegenüber verhalten.0
    BFIK_agree_1RIch neige dazu, andere zu kritisieren.0
    BFIK_neuro_2RIch bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.0
    BFIK_agree_3RIch kann mich kalt und distanziert verhalten.0
    BFIK_neuro_3Ich mache mir viele Sorgen.0
    BFIK_neuro_4Ich werde leicht nervös und unsicher.0
    BFIK_agree_2Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.0
    BFIK_agree4 BFIK_agree items aggregated by aggregation_function0
    BFIK_neuro3 BFIK_neuro items aggregated by aggregation_function0
    ageAlter0

    Note

    This dataset was automatically described using the codebook R package (version 0.9.6).

  9. r

    MOCK Qualtrics dataset

    • cran.r-universe.dev
    • rubenarslan.github.io
    Updated Aug 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruben Arslan (2018). MOCK Qualtrics dataset [Dataset]. http://doi.org/10.5281/zenodo.1326520
    Explore at:
    Dataset updated
    Aug 1, 2018
    Dataset provided by
    MPI Human Development, Berlin
    Authors
    Ruben Arslan
    Time period covered
    2018
    Area covered
    Nowhere
    Variables measured
    Q7, Q10, ResponseSet
    Description

    a MOCK dataset used to show how to import Qualtrics metadata into the codebook R package

    Table of variables

    This table contains variable names, labels, and number of missing values. See the complete codebook for more.

    namelabeln_missing
    ResponseSetNA0
    Q7NA0
    Q10NA0

    Note

    This dataset was automatically described using the codebook R package (version 0.9.6).

  10. Z

    Film Circulation dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samoilova, Evgenia (Zhenya) (2024). Film Circulation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7887671
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Samoilova, Evgenia (Zhenya)
    Loist, Skadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

    A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    Please cite this when using the dataset.

    Detailed description of the dataset:

    1 Film Dataset: Festival Programs

    The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

    The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

    The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

    2 Survey Dataset

    The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

    The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

    The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

    3 IMDb & Scripts

    The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

    The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

    The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

    The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

    The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

    The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

    The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

    The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

    The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

    The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

    The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

    The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

    The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

    The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

    The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

    The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

    The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

    The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

    The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

    4 Festival Library Dataset

    The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

    The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.

    The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This

  11. r

    Air Quality Monitoring - 2009 (grouped by pollutant)

    • researchdata.edu.au
    • data.qld.gov.au
    Updated Jan 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.qld.gov.au (2021). Air Quality Monitoring - 2009 (grouped by pollutant) [Dataset]. https://researchdata.edu.au/air-quality-monitoring-grouped-pollutant/1673367
    Explore at:
    Dataset updated
    Jan 7, 2021
    Dataset provided by
    data.qld.gov.au
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annual hourly air quality and meteorological data by pollutant for the 2009 calendar year. For more information on air quality, including live air data, please visit qld.gov.au/environment/pollution/monitoring/air. \r \r Data resolution: One-hour average values \r Data row timestamp: Start of averaging period \r Missing data/not monitored: Blank cell \r Sampling height: Four metres above ground (unless otherwise indicated) \r \r

  12. a

    Catholics per Parish, Null Values

    • hub.arcgis.com
    • catholic-geo-hub-cgisc.hub.arcgis.com
    Updated Oct 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    burhansm2 (2019). Catholics per Parish, Null Values [Dataset]. https://hub.arcgis.com/content/0829ec89280f42babd17011489a0d240
    Explore at:
    Dataset updated
    Oct 26, 2019
    Dataset authored and provided by
    burhansm2
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Area covered
    Pacific Ocean, North Pacific Ocean
    Description

    Catholics per Parish {title at top of page}Data Developers: Burhans, Molly A., Cheney, David M., Emege, Thomas, Gerlt, R.. . “Catholics per Parish {title at top of page}”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Catholic Hierarchy, Environmental Systems Research Institute, Inc., 2019.Web map developer: Molly Burhans, October 2019Web app developer: Molly Burhans, October 2019GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/The Catholic Leadership global maps information is derived from the Annuario Pontificio, which is curated and published by the Vatican Statistics Office annually, and digitized by David Cheney at Catholic-Hierarchy.org -- updated are supplemented with diocesan and news announcements. GoodLands maps this into global ecclesiastical boundaries. Admin 3 Ecclesiastical Territories:Burhans, Molly A., Cheney, David M., Gerlt, R.. . “Admin 3 Ecclesiastical Territories For Web”. Scale not given. Version 1.2. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.Derived from:Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.

  13. d

    Data from: Unforeseen consequences of excluding missing data from...

    • datadryad.org
    • zenodo.org
    zip
    Updated Jun 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huateng Huang; L. Lacey Knowles (2014). Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences [Dataset]. http://doi.org/10.5061/dryad.jf361
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 27, 2014
    Dataset provided by
    Dryad
    Authors
    Huateng Huang; L. Lacey Knowles
    Time period covered
    2014
    Description

    speciestree20 species tree simulated under Yule model using MesquitemsPerl script to simulate coalescent genealogies for given species treeseqgene_paramtershort R script to generate random mutation rates (as the theta for simulating dna sequences ) from a log normal distributionseqgen parametersthe result from seqgene_paramter.R-- the mutation rates used in simulating sequencesseqge.parameterseq-genPerl script to simulate sequences given genealogies and mutation rate (as theta)infomissinga Perl script to filter out sequences with mutations at enzyme cutting sitescoveragea Perl script for filtering out sequences with no read (coverage draw from a poisson distribution)clustera Perl script for generating post-sequencing missing datalociinda perl script for summarizing the number of individuals for each locus (output in a txt file)countsa Perl script for counting the number loci at different tolerance levels (output a txt file)phylipranda Perl script for generating phylip formatted sequenc...

  14. d

    Data from: Quantifying the impacts of management and herbicide resistance on...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Goodsell; David Comont; Helen Hicks; James Lambert; Richard Hull; Laura Crook; Paolo Fraccaro; Katharina Reusch; Robert Freckleton; Dylan Childs (2023). Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data [Dataset]. http://doi.org/10.5061/dryad.9cnp5hqn5
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Robert Goodsell; David Comont; Helen Hicks; James Lambert; Richard Hull; Laura Crook; Paolo Fraccaro; Katharina Reusch; Robert Freckleton; Dylan Childs
    Time period covered
    Jan 1, 2022
    Description

    A key challenge in the management of populations is to quantify the impact of interven-tions in the face of environmental and phenotypic variability. However, accurate estima-tion of the effects of management and environment, in large-scale ecological research is often limited by the expense of data collection, the inherent trade-off between quality and quantity, and missing data. In this paper we develop a novel modelling framework, and demographically informed imputation scheme, to comprehensively account for the uncertainty generated by miss-ing population, management, and herbicide resistance data. Using this framework and a large dataset (178 sites over 3 years) on the densities of a destructive arable weed (Alo-pecurus myosuroides) we investigate the effects of environment, management, and evolved herbicide resistance, on weed population dynamics. In this study we quantify the marginal effects of a suite of common management prac-tices, including cropping, cultivation, and herbici..., Data were collected from a network of UK farms using a density structured survey method outlined in Queensborough 2011. , , # Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data

    Contained are the datasets and code required to replicate the analyses in Goodsell et al (2023), Quantifying the impacts of management and herbicide resistance on regional plant population dynamics in the face of missing data.

    Description of the data and file structure

    Data: Contains data required to run all stages in the analysis.

    Many files contain the same variable names, important variables have been described in the first object they appear in.

    all_imputation_data.rds - The data required to run the imputation scheme, this is an R list containing the following:

    $Management - data frame containing missing and observed values for management imputation

    FF & FFY: the specific field, and field year.

    year: the year.

    crop: crop

    cult_cat : cultivation category

    a_gly: number of autumn (post September 1st) glyphosate applicatio...

  15. t

    ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

    • researchdata.tuwien.ac.at
    • b2find.dkrz.de
    zip
    Updated Feb 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    TU Wien
    Authors
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

    This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

    Dataset paper (public preprint)

    A description of this dataset, including the methodology and validation results, is available at:

    Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

    Abstract

    ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
    However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
    Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

    Summary

    • Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling
    • Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology
    • Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.
    • More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

    Programmatic Download

    You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

    #!/bin/bash

    # Set download directory
    DOWNLOAD_DIR=~/Downloads

    base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

    # Loop through years 1991 to 2023 and download & extract data
    for year in {1991..2023}; do
    echo "Downloading $year.zip..."
    wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
    unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
    rm "$DOWNLOAD_DIR/$year.zip"
    done

    Data details

    The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

    ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

    Data Variables

    Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

    • sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).
    • sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.
    • sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)
    • sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also available when an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.
    • gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (0), and where the interpolated value is used (1) in the 'sm' field.
    • frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

    Additional information for each variable is given in the netCDF attributes.

    Version Changelog

    Changes in v9.1r1 (previous version was v09.1):

    • This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

    Software to open netCDF files

    These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

    References

    • Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.
    • Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869
    • Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020
    • Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

    Related Records

    The following records are all part of the Soil Moisture Climate Data Records from satellites community

    1

    ESA CCI SM MODELFREE Surface Soil Moisture Record

    <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"

  16. Daily and Annual NO2 Concentrations for the Contiguous United States, 1-km...

    • data.nasa.gov
    • cmr.earthdata.nasa.gov
    • +2more
    application/rdfxml +5
    Updated Feb 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Daily and Annual NO2 Concentrations for the Contiguous United States, 1-km Grids, Version 1.10 (2000-2016) [Dataset]. https://data.nasa.gov/dataset/Daily-and-Annual-NO2-Concentrations-for-the-Contig/ipab-ju3m
    Explore at:
    xml, csv, application/rssxml, application/rdfxml, tsv, jsonAvailable download formats
    Dataset updated
    Feb 5, 2024
    Area covered
    United States, Contiguous United States
    Description

    The Daily and Annual NO2 Concentrations for the Contiguous United States, 1-km Grids, Version 1.10 (2000-2016) data set contains daily predictions of Nitrogen Dioxide (NO2) concentrations at a high resolution (1-km grid cells) for the years 2000 to 2016. An ensemble modeling framework was used to assess NO2 levels with high accuracy, which combined estimates from three machine learning models (neural network, random forest, and gradient boosting), with a generalized additive model. Predictor variables included NO2 column concentrations from satellites, land-use variables, meteorological variables, predictions from two chemical transport models, GEOS-Chem and the U.S. Environmental Protection Agency (EPA) CommUnity Multiscale Air Quality Modeling System (CMAQ), along with other ancillary variables. The annual predictions were calculated by averaging the daily predictions for each year in each grid cell. The ensemble produced a cross-validated R-squared value of 0.79 overall, a spatial R-squared value of 0.84, and a temporal R-squared value of 0.73. In version 1.10, the completeness of daily NO2 predictions have been enhanced by employing linear interpolation to impute missing values. Specifically, for days with small spatial patches of missing data with less than 100 grid cells, inverse distance weighting interpolation was used to fill the missing grid cells. Other missing daily NO2 predictions were interpolated from the nearest days with available data. Annual predictions were updated by averaging the imputed daily predictions for each year in each grid cell. These daily and annual NO2 predictions allow public health researchers to respectively estimate the short- and long-term effects of NO2 exposures on human health, supporting the U.S. EPA for the revision of the National Ambient Air Quality Standards for daily average and annual average concentrations of NO2. The data are available in RDS and GeoTIFF formats for statistical research and geospatial analysis.

  17. d

    Wheat Corn Soy Estimates Red River of the North Basin.

    • datadiscoverystudio.org
    • data.usgs.gov
    • +2more
    Updated May 20, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Wheat Corn Soy Estimates Red River of the North Basin. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/e9664ee0a6744b83a88f27325051bc01/html
    Explore at:
    Dataset updated
    May 20, 2018
    Description

    description: These data describe the percent of cropland harvested as wheat, corn, and soybean within each basin (basins 1-8, see accompanying shapefiles). Data are available for other crops; however, these three were chosen because wheat is a traditional crop that has been grown for a long time in the Basin and corn and soybeans have increased in recent times because of wetter conditions, the demand for biofuels, and advances in breeding short-season, drought-tolerant crops. The data come from the National Agricultural Statistics Service (NASS) Census of Agriculture (COA) and have estimates for 1974, 1978, 1982, 1986, 1992, 1997, 2002, 2007, and 2012. Years with missing data were estimated estimated using multivariate imputation of missing values with principal components analysis (PCA) via the function imputePCA in the R (R Core Team, 2015) package missMDA (Husson and Josse, 2015). In the interest of dimension reduction, the scores of the first principal component of principal component analysis, by basin, of the wheat, corn, and soy variables is included. Husson, F., and Josse, J., 2015, missMDAHandling missing values with multivariate data analysis: R package version 1.9, https://CRAN.R-project.org/package=missMDA. R Core Team, 2015, R: A language and environment for statistical computing: R Foundation for Statistical Computing, Vienna, http://www.R-project.org.; abstract: These data describe the percent of cropland harvested as wheat, corn, and soybean within each basin (basins 1-8, see accompanying shapefiles). Data are available for other crops; however, these three were chosen because wheat is a traditional crop that has been grown for a long time in the Basin and corn and soybeans have increased in recent times because of wetter conditions, the demand for biofuels, and advances in breeding short-season, drought-tolerant crops. The data come from the National Agricultural Statistics Service (NASS) Census of Agriculture (COA) and have estimates for 1974, 1978, 1982, 1986, 1992, 1997, 2002, 2007, and 2012. Years with missing data were estimated estimated using multivariate imputation of missing values with principal components analysis (PCA) via the function imputePCA in the R (R Core Team, 2015) package missMDA (Husson and Josse, 2015). In the interest of dimension reduction, the scores of the first principal component of principal component analysis, by basin, of the wheat, corn, and soy variables is included. Husson, F., and Josse, J., 2015, missMDAHandling missing values with multivariate data analysis: R package version 1.9, https://CRAN.R-project.org/package=missMDA. R Core Team, 2015, R: A language and environment for statistical computing: R Foundation for Statistical Computing, Vienna, http://www.R-project.org.

  18. g

    codebook_selection_data

    • laurabotzet.github.io
    Updated Oct 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). codebook_selection_data [Dataset]. https://laurabotzet.github.io/effects_of_contraception/4_codebook.html
    Explore at:
    Dataset updated
    Oct 3, 2021
    Variables measured
    age, bfi_open, bfi_agree, bfi_consc, bfi_extra, bfi_neuro, bfi_open_1, bfi_open_2, bfi_open_3, bfi_open_4, and 56 more
    Description

    The dataset has N=1179 rows and 66 columns. 622 rows have no missing values on any column.

    Table of variables

    This table contains variable names, labels, and number of missing values. See the complete codebook for more.

    [truncated]

    Note

    This dataset was automatically described using the codebook R package (version 0.9.2).

  19. Data from: Effects of growth rate, size, and light availability on tree...

    • zenodo.org
    • narcis.nl
    Updated May 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aristides Moustakas; Matthew R. Evans; Aristides Moustakas; Matthew R. Evans (2022). Data from: Effects of growth rate, size, and light availability on tree survival across life stages: a demographic analysis accounting for missing values and small sample sizes [Dataset]. http://doi.org/10.5061/dryad.6f4qs
    Explore at:
    Dataset updated
    May 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Aristides Moustakas; Matthew R. Evans; Aristides Moustakas; Matthew R. Evans
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Background: Plant survival is a key factor in forest dynamics and survival probabilities often vary across life stages. Studies specifically aimed at assessing tree survival are unusual and so data initially designed for other purposes often need to be used; such data are more likely to contain errors than data collected for this specific purpose. Results: We investigate the survival rates of ten tree species in a dataset designed to monitor growth rates. As some individuals were not included in the census at some time points we use capture-mark-recapture methods both to allow us to account for missing individuals, and to estimate relocation probabilities. Growth rates, size, and light availability were included as covariates in the model predicting survival rates. The study demonstrates that tree mortality is best described as constant between years and size-dependent at early life stages and size independent at later life stages for most species of UK hardwood. We have demonstrated that even with a twenty-year dataset it is possible to discern variability both between individuals and between species. Conclusions: Our work illustrates the potential utility of the method applied here for calculating plant population dynamics parameters in time replicated datasets with small sample sizes and missing individuals without any loss of sample size, and including explanatory covariates.

  20. f

    MAPE and PB statistics for IBFI compared with other imputation methods...

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique (2023). MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR). [Dataset]. http://doi.org/10.1371/journal.pone.0262131.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Adil Aslam Mir; Kimberlee Jane Kearfott; Fatih Vehbi Çelebi; Muhammad Rafique
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MAPE and PB statistics for IBFI compared with other imputation methods (mean, median, mode, PMM, and Hotdeck) for 20% missingness of type MAR and all parameters tested (RN, TH, TC, RH, and PR).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Example data sets) [Dataset]. http://doi.org/10.5281/zenodo.7773614

Missing data in the analysis of multilevel and dependent data (Example data sets)

Explore at:
binAvailable download formats
Dataset updated
Jul 20, 2023
Dataset provided by
Springer
Authors
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Example data sets for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the data sets used in both example analyses (Examples 1 and 2) in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000)
x = numeric (Level 1)
y = numeric (Level 1)
w = binary (Level 2)

In all data sets, missing values are coded as "NA".

Search
Clear search
Close search
Google apps
Main menu