32 datasets found
  1. SMDG, A Standardized Fundus Glaucoma Dataset

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riley Kiefer (2023). SMDG, A Standardized Fundus Glaucoma Dataset [Dataset]. https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset/code
    Explore at:
    zip(3144020550 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Riley Kiefer
    Description

    Standardized Multi-Channel Dataset for Glaucoma (SMDG-19), a standardization of 19 public glaucoma datasets for AI applications.

    Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is designed to be exploratory and open-ended with multiple use cases and no established training/validation/test cases. This dataset is the largest public repository of fundus images with glaucoma.

    Citation

    Please cite at least the first work in academic publications: 1. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 2. R. Kiefer, M. Abid, M. R. Ardali, J. Steen and E. Amjadian, "Automated Fundus Image Standardization Using a Dynamic Global Foreground Threshold Algorithm," 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 2023, pp. 460-465, doi: 10.1109/ICIVC58118.2023.10270429. 3. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 4. R. Kiefer, J. Steen, M. Abid, M. R. Ardali and E. Amjadian, "A Survey of Glaucoma Detection Algorithms using Fundus and OCT Images," 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2022, pp. 0191-0196, doi: 10.1109/IEMCON56893.2022.9946629.

    Please also see the following optometry abstract publications: 1. A Comprehensive Survey of Publicly Available Glaucoma Datasets for Automated Glaucoma Detection; AAO 2022; https://aaopt.org/past-meeting-abstract-archives/?SortBy=ArticleYear&ArticleType=&ArticleYear=2022&Title=&Abstract=&Authors=&Affiliation=&PROGRAMNUMBER=225129 2. Standardized and Open-Access Glaucoma Dataset for Artificial Intelligence Applications; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2790420 3. Ground truth validation of publicly available datasets utilized in artificial intelligence models for glaucoma detection; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2791017

    Please also see the DOI citations for this and related datasets: 1. SMDG; @dataset{smdg, title={SMDG, A Standardized Fundus Glaucoma Dataset}, url={https://www.kaggle.com/ds/2329670}, DOI={10.34740/KAGGLE/DS/2329670}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 2. EyePACS-light-v1 @dataset{eyepacs-light-v1, title={Glaucoma Dataset: EyePACS AIROGS - Light}, url={https://www.kaggle.com/ds/3222646}, DOI={10.34740/KAGGLE/DS/3222646}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 3. EyePACS-light-v2 @dataset{eyepacs-light-v2, title={Glaucoma Dataset: EyePACS-AIROGS-light-V2}, url={https://www.kaggle.com/dsv/7300206}, DOI={10.34740/KAGGLE/DSV/7300206}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} }

    Dataset Objective

    The objective of this dataset is a machine learning-ready dataset for glaucoma-related applications. Using the help of the community, new open-source glaucoma datasets will be reviewed for standardization and inclusion in this dataset.

    Data Standardization

    • Full fundus images (and corresponding segmentation maps) are standardized using a novel algorithm (Citation 1) by cropping the background, centering the fundus image, padding missing information, and resizing to 512x512 pixels. This standardization ensures that the most amount of foreground information is prevalent during the resizing process for machine-learning-ready image processing.
    • Each available metadata text is standardized by provided each fundus image as a row and each fundus attribute as a column in a CSV file
    Dataset InstanceOriginal FundusStandardized Fundus Image
    sjchoi86-HRFhttps://user-images.githubusercontent.com/65875562/204170005-2d4dd051-0032-40c8-ba0b-390b6080bb69.png">https://user-images.githubusercontent.com/65875562/204170011-51b7d001-4d43-4f0d-835e-984d45116b18.png">
    BEHhttps://user-images.githubusercontent.com/65875562/211052753-93f8a3aa-cc65-4790-8da6-229f512a6afb.PNG"><img src="htt...
  2. CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Thilde Terkelsen; Anders Krogh; Elena Papaleo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.

  3. WoSIS snapshot - December 2023

    • data.isric.org
    • repository.soilwise-he.eu
    Updated Dec 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ISRIC - World Soil Information (2023). WoSIS snapshot - December 2023 [Dataset]. https://data.isric.org/geonetwork/srv/api/records/e50f84e1-aa5b-49cb-bd6b-cd581232a2ec
    Explore at:
    www:link-1.0-http--related, www:link-1.0-http--link, www:download-1.0-ftp--downloadAvailable download formats
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    International Soil Reference and Information Centre
    Authors
    ISRIC - World Soil Information
    Time period covered
    Jan 1, 1918 - Dec 1, 2022
    Area covered
    Description

    ABSTRACT: The World Soil Information Service (WoSIS) provides quality-assessed and standardized soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the ‘WoSIS snapshot 2019’ many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardized in accordance with the licenses specified by the data providers. The source data were contributed by a wide range of data providers, therefore special attention was paid to the standardization of soil property definitions, soil analytical procedures and soil property values (and units of measurement). We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable. For each profile we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations as far as these have been specified in the source databases. Three measures for 'fitness-for-intended-use' are provided: positional uncertainty (for site locations), time of sampling/description, and a first approximation for the uncertainty associated with the operationally defined analytical methods. These measures should be considered during digital soil mapping and subsequent earth system modelling that use the present set of soil data. DATA SET DESCRIPTION: The 'WoSIS 2023 snapshot' comprises data for 228k profiles from 217k geo-referenced sites that originate from 174 countries. The profiles represent over 900k soil layers (or horizons) and over 6 million records. The actual number of measurements for each property varies (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes. The data are provided in TSV (tab separated values) format and as GeoPackage. The zip-file (446 Mb) contains the following files: - Readme_WoSIS_202312_v2.pdf: Provides a short description of the dataset, file structure, column names, units and category values (this file is also available directly under 'online resources'). The pdf includes links to tutorials for downloading the TSV files into R respectively Excel. See also 'HOW TO READ TSV FILES INTO R AND PYTHON' in the next section. - wosis_202312_observations.tsv: This file lists the four to six letter codes for each observation, whether the observation is for a site/profile or layer (horizon), the unit of measurement and the number of profiles respectively layers represented in the snapshot. It also provides an estimate for the inferred accuracy for the laboratory measurements. - wosis_202312_sites.tsv: This file characterizes the site location where profiles were sampled. - wosis_2023112_profiles: Presents the unique profile ID (i.e. primary key), site_id, source of the data, country ISO code and name, positional uncertainty, latitude and longitude (WGS 1984), maximum depth of soil described and sampled, as well as information on the soil classification system and edition. Depending on the soil classification system used, the number of fields will vary . - wosis_202312_layers: This file characterises the layers (or horizons) per profile, and lists their upper and lower depths (cm). - wosis_202312_xxxx.tsv : This type of file presents results for each observation (e.g. “xxxx” = “BDFIOD” ), as defined under “code” in file wosis_202312_observation.tsv. (e.g. wosis_202311_bdfiod.tsv). - wosis_202312.gpkg: Contains the above datafiles in GeoPackage format (which stores the files within an SQLite database). HOW TO READ TSV FILES INTO R AND PYTHON: A) To read the data in R, please uncompress the ZIP file and specify the uncompressed folder. setwd("/YourFolder/WoSIS_2023_December/") ## For example: setwd('D:/WoSIS_2023_December/') Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time). observations = readr::read_tsv('wosis_202312_observations.tsv', col_types='cccciid') observations ## show columns and first 10 rows sites = readr::read_tsv('wosis_202312_sites.tsv', col_types='iddcccc') sites profiles = readr::read_tsv('wosis_202312_profiles.tsv', col_types='icciccddcccccciccccicccci') profiles layers = readr::read_tsv('wosis_202312_layers.tsv', col_types='iiciciiilcc') layers ## Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv': orgc = readr::read_tsv('wosis_202312_orgc.tsv', col_types='iicciilccdccddccccc') orgc Note: One may also use the following R code (example is for file 'observations.tsv'): observations <- read.table("wosis_202312_observations.tsv", sep = "\t", header = TRUE, quote = "", comment.char = "", stringsAsFactors = FALSE ) B) To read the files into python first decompress the files to your selected folder. Then in python: # import the required library import pandas as pd # Read the observations data observations = pd.read_csv("wosis_202312_observations.tsv", sep="\t") # print the data frame header and some rows observations.head() # Read the sites data sites = pd.read_csv("wosis_202312_sites.tsv", sep="\t") # Read the profiles data profiles = pd.read_csv("wosis_202312_profiles.tsv", sep="\t") # Read the layers data layers = pd.read_csv("wosis_202312_layers.tsv", sep="\t") # Read the soil property data, e.g. 'cfvo' (do this for each observation) cfvo = pd.read_csv("wosis_202312_cfvo.tsv", sep="\t") CITATION: Calisto, L., de Sousa, L.M., Batjes, N.H., 2023. Standardised soil profile data for the world (WoSIS snapshot – December 2023), https://doi.org/10.17027/isric-wdcsoils-20231130 Supplement to: Batjes N.H., Calisto, L. and de Sousa L.M., 2023. Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023). Earth System Science Data, https://doi.org/10.5194/essd-16-4735-2024.

  4. d

    Database of Trends in Vegetation Properties and Climate Adaptation Variables...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Database of Trends in Vegetation Properties and Climate Adaptation Variables -- Standardized Precipitation Evapotranspiration Index Timeseries for the Upper Gila River Watershed: 1985 to 2021 [Dataset]. https://catalog.data.gov/dataset/database-of-trends-in-vegetation-properties-and-climate-adaptation-variables-standardized-
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Gila River
    Description

    We apply a research approach that can inform riparian restoration planning by developing products that show recent trends in vegetation conditions identifying areas potentially more at risk for degradation and the associated relationship between riparian vegetation dynamics and climate conditions. The full suite of data products and a link to the associated publication addressing this analysis can be found on the Parent data release. To characterize the climate conditions across the study period, we use the Standardized Precipitation Evapotranspiration Index (SPEI). The SPEI is a water balance index which includes both precipitation and evapotranspiration in its calculation. Conditions from the prior n months, generally ranging from 1 to 60, are compared to the same respective period over the prior years to identify the index value (Vicente-Serrano et al., 2010). Values generally range from -3 to 3, where values less than 0 suggest drought conditions while values greater than 0 suggest wetter than normal conditions. For this study, we are using the 12-month, or 1-year, SPEI to compare annual conditions within the larger Upper Gila River watershed. The SPEI data was extracted into a CSV spreadsheet using data from the Gridded Surface Meteorological (GRIDMET) dataset, which provides a spatially explicit SPEI product in Google Earth Engine (GEE) at a 5-day interval and a spatial resolution of 4-km (Abatzoglou, 2013). In GEE, we quantify overall mean values of SPEI across each 5-day period for the watershed from January 1980 to December 2021. Using R software, we reduced the 5-day values to represent monthly mean values and constrained the analysis to water year 1980 (i.e., October 1980) through water year 2021 (i.e., October 2021). Using the monthly timeseries, we completed the breakpoint analysis in R to identify breaks within the SPEI time series. The algorithm identifies a seasonal pattern within the timeseries. When the seasonal pattern deviates, a breakpoint is then detected. These breaks can be used to pinpoint unique climate periods in the time series. This is a Child Item for the Parent data release, Mapping Riparian Vegetation Response to Climate Change on the San Carlos Apache Reservation and Upper Gila River Watershed to Inform Restoration Priorities: 1935 to Present - Database of Trends in Vegetation Properties and Climate Adaptation Variables. The spreadsheet attached to this Child Item consists of 5 columns, including the (i) month from January 1985 through October 2021, (ii) the 1-year SPEI monthly time series, (iii) the dates identified as breaks within the breakpoint algorithm, (iv) the breakpoint trend identified within the breakpoint algorithm, and (v) the dates that were used as the climate period breaks in this study. The climate periods identified in this spreadsheet using the SPEI data were used as the climate periods in our riparian study.

  5. Example subjects for Mobilise-D data standardization

    • zenodo.org
    • data.niaid.nih.gov
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Palmerini; Luca Palmerini; Luca Reggi; Tecla Bonci; Silvia Del Din; Encarna Micó-Amigo; Francesca Salis; Stefano Bertuletti; Marco Caruso; Andrea Cereatti; Eran Gazit; Anisoara Paraschiv-Ionescu; Abolfazl Soltani; Felix Kluge; Arne Küderle; Martin Ullrich; Cameron Kirk; Hugo Hiden; Ilaria D'Ascanio; Clint Hansen; Lynn Rochester; Claudia Mazzà; Lorenzo Chiari; on behalf of the Mobilise-D consortium; Luca Reggi; Tecla Bonci; Silvia Del Din; Encarna Micó-Amigo; Francesca Salis; Stefano Bertuletti; Marco Caruso; Andrea Cereatti; Eran Gazit; Anisoara Paraschiv-Ionescu; Abolfazl Soltani; Felix Kluge; Arne Küderle; Martin Ullrich; Cameron Kirk; Hugo Hiden; Ilaria D'Ascanio; Clint Hansen; Lynn Rochester; Claudia Mazzà; Lorenzo Chiari; on behalf of the Mobilise-D consortium (2022). Example subjects for Mobilise-D data standardization [Dataset]. http://doi.org/10.5281/zenodo.7185429
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luca Palmerini; Luca Palmerini; Luca Reggi; Tecla Bonci; Silvia Del Din; Encarna Micó-Amigo; Francesca Salis; Stefano Bertuletti; Marco Caruso; Andrea Cereatti; Eran Gazit; Anisoara Paraschiv-Ionescu; Abolfazl Soltani; Felix Kluge; Arne Küderle; Martin Ullrich; Cameron Kirk; Hugo Hiden; Ilaria D'Ascanio; Clint Hansen; Lynn Rochester; Claudia Mazzà; Lorenzo Chiari; on behalf of the Mobilise-D consortium; Luca Reggi; Tecla Bonci; Silvia Del Din; Encarna Micó-Amigo; Francesca Salis; Stefano Bertuletti; Marco Caruso; Andrea Cereatti; Eran Gazit; Anisoara Paraschiv-Ionescu; Abolfazl Soltani; Felix Kluge; Arne Küderle; Martin Ullrich; Cameron Kirk; Hugo Hiden; Ilaria D'Ascanio; Clint Hansen; Lynn Rochester; Claudia Mazzà; Lorenzo Chiari; on behalf of the Mobilise-D consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

    The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).

  6. Standardized Precipitation Index (SPI)

    • hub.arcgis.com
    • climate-center-lincolninstitute.hub.arcgis.com
    • +1more
    Updated Sep 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2020). Standardized Precipitation Index (SPI) [Dataset]. https://hub.arcgis.com/maps/c28f8aa27434404e8748e656c15e2e34
    Explore at:
    Dataset updated
    Sep 9, 2020
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    Droughts are natural occurring events in which dry conditions persist over time. Droughts are complex to characterize because they depend on water and energy balances at different temporal and spatial scales. The Standardized Precipitation Index (SPI) is used to analyze meteorological droughts. SPI estimates the deviation of precipitation from the long-term probability function at different time scales (e.g. 1, 3, 6, 9, or 12 months). SPI only uses monthly precipitation as an input, which can be helpful for characterizing meteorological droughts. Other variables should be included (e.g. temperature or evapotranspiration) in the characterization of other types of droughts (e.g. agricultural droughts).This layer shows the SPI index at different temporal periods calculated using the SPEI library in R and precipitation data from CHIRPS data set.Sources:Climate Hazards Center InfraRed Precipitation with Station data (CHIRPS)SPEI R library

  7. n

    Methods for normalizing microbiome data: an ecological perspective

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    University of New England
    James Cook University
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  8. d

    Data from: Standardizing protocols for determining the cause of mortality in...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bogdan Cristescu; Mark Elbroch; Tavis Forrester; Maximilian Allen; Derek Spitz; Christopher Wilmers; Heiko Wittmer (2022). Standardizing protocols for determining the cause of mortality in wildlife studies [Dataset]. http://doi.org/10.7291/D1GD50
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 22, 2022
    Dataset provided by
    Dryad
    Authors
    Bogdan Cristescu; Mark Elbroch; Tavis Forrester; Maximilian Allen; Derek Spitz; Christopher Wilmers; Heiko Wittmer
    Time period covered
    May 31, 2022
    Description

    The datasets can be opened using Microsoft Excel and R.

  9. f

    Proteomics Wants cRacker: Automated Standardized Data Analysis of LC–MS...

    • acs.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Zauber; Waltraud X. Schulze (2023). Proteomics Wants cRacker: Automated Standardized Data Analysis of LC–MS Derived Proteomic Data [Dataset]. http://doi.org/10.1021/pr300413v.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Henrik Zauber; Waltraud X. Schulze
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100 000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC–MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.

  10. SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher R. Cabanski; Yuan Qi; Xiaoying Yin; Eric Bair; Michele C. Hayward; Cheng Fan; Jianying Li; Matthew D. Wilkerson; J. S. Marron; Charles M. Perou; D. Neil Hayes (2023). SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements [Dataset]. http://doi.org/10.1371/journal.pone.0009905
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Christopher R. Cabanski; Yuan Qi; Xiaoying Yin; Eric Bair; Michele C. Hayward; Cheng Fan; Jianying Li; Matthew D. Wilkerson; J. S. Marron; Charles M. Perou; D. Neil Hayes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.

  11. Z

    Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou (2024). Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3974999
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Florida State University
    Yale University Peabody Museum of Natural History
    University of Florida
    Agriculture and Agri-Food Canada
    American Museum of Natural History
    Arizona State University
    Authors
    Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou
    License

    https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/

    Area covered
    World
    Description

    This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

    Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

    This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

    This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

    Files included in this resource

    9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format

    0067804-200613084148143.zip: Raw data from GBIF, DwC-A format

    0067806-200613084148143.zip: Raw data from GBIF, DwC-A format

    1623690110.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format

    bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.

    bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.

    flagEventDate.txt: file containing term definition to reference in DwC-A

    flagExclude.txt: file containing term definition to reference in DwC-A

    flagGeoreference.txt: file containing term definition to reference in DwC-A

    flagTaxonomy.txt: file containing term definition to reference in DwC-A

    georeferencedByID.txt: file containing term definition to reference in DwC-A

    identifiedByNames.txt: file containing term definition to reference in DwC-A

    instructions-to-get-people-data-from-bionomia-via-datasetKey: instructions given to data providers

    RAPID-code_collection-date.R: code associated with enhancing collection dates

    RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data

    RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages

    RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages

    RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages

    RAPID-code_people.R: code associated with enhancing data about people

    RAPID-code_standardize-country.R: code associated with standardizing country data

    RAPID-data-dictionary.pdf: metadata about terms included in this project’s data, in PDF format

    RAPID-data-dictionary.xlsx: metadata about terms included in this project’s data, in spreadsheet format

    rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv

    rapid-final-data-product_2021-06-29.zip: Enhanced data from BIOSPEX, DwC-A format

    rapid-final-gazetteer.zip: Gazetteer providing georeference data and metadata for 10,341 localities assessed as part of this project

    rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized

    RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates

    RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data

    RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages

    RAPID-protocol_georeference.pdf: protocol associated with georeferencing

    RAPID-protocol_people.pdf: protocol associated with enhancing data about people

    RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data

    RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data

    RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

    recordedByNames.txt: file containing term definition to reference in DwC-A

    Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

    wikidata-notes-for-bat-collectors_leachman_2020: please see https://zenodo.org/record/4724139 for this resource

  12. Z

    soilmap_simple: a simplified and standardized derivative of the digital soil...

    • data.niaid.nih.gov
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanderhaeghe, Floris; De Vos, Bruno; Cools, Nathalie (2025). soilmap_simple: a simplified and standardized derivative of the digital soil map of the Flemish Region [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3732903
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Research Institute for Nature and Forest (INBO)
    Authors
    Vanderhaeghe, Floris; De Vos, Bruno; Cools, Nathalie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Flanders, Flemish Region
    Description

    The data source soilmap_simple is a simplified and standardized derived form of the 'digital soil map of the Flemish Region' (the shapefile of which we named soilmap, for analytical workflows in R) published by 'Databank Ondergrond Vlaanderen’ (DOV). It is a GeoPackage that contains a spatial polygon layer ‘soilmap_simple’ in the Belgian Lambert 72 coordinate reference system (EPSG-code 31370), plus a non-spatial table ‘explanations’ with the meaning of category codes that occur in the spatial layer. Further documentation about the digital soil map of the Flemish Region is available in Van Ranst & Sys (2000) and Dudal et al. (2005).

    This version of soilmap_simple was derived from version 'soilmap_2017-06-20' (Zenodo DOI) as follows:

    all attribute variables received English names (purpose of standardization), starting with prefix bsm_ (referring to the 'Belgian soil map');

    attribute variables were reordered;

    the values of the morphogenetic substrate, texture and drainage variables (bsm_mo_substr, bsm_mo_tex and bsm_mo_drain + their _explan counterparts) were filled for most features in the 'coastal plain' area.

    To derive morphogenetic texture and drainage levels from the geomorphological soil types, a conversion table by Bruno De Vos & Carole Ampe was applied (for earlier work on this, see Ampe 2013).

    Substrate classes were copied over from bsm_ge_substr into bsm_mo_substr (bsm_ge_substr already followed the categories of bsm_mo_substr).

    These steps coincide with the approach that had been taken to construct the Unitype variable in the soilmap data source;
    

    only a minimal number of variables were selected: those that are most useful for analytical work.

    See R-code in the GitHub repository 'n2khab-preprocessing' at commit b3c6696 for the creation from the soilmap data source.

    A reading function to return soilmap_simple (this data source) or soilmap in a standardized way into the R environment is provided by the R-package n2khab.

    The attributes of the spatial polygon layer soilmap_simple can have mo_ in their name to refer to the Belgian Morphogenetic System:

    bsm_poly_id: unique polygon ID (numeric)

    bsm_region: name of the region

    bsm_converted: boolean. Were morphogenetic texture and drainage variables (bsm_mo_tex and bsm_mo_drain) derived from a conversion table (see above)? Value TRUE is largely confined to the 'coastal plain' areas.

    bsm_mo_soilunitype: code of the soil type (applying morphogenetic codes within the coastal plain areas when possible, just as for the following three variables)

    bsm_mo_substr: code of the soil substrate

    bsm_mo_tex: code of the soil texture category

    bsm_mo_drain: code of the soil drainage category

    bsm_mo_prof: code of the soil profile category

    bsm_mo_parentmat: code of a variant regarding the parent material

    bsm_mo_profvar: code of a variant regarding the soil profile

    The non-spatial table explanations has following variables:

    subject: attribute name of the spatial layer: either bsm_mo_substr, bsm_mo_tex, bsm_mo_drain, bsm_mo_prof, bsm_mo_parentmat or bsm_mo_profvar

    code: category code that occurs as value for the corresponding attribute in the spatial layer

    name: explanation of the value of code

  13. n

    Dataset: A three-dimensional approach to general plant fire syndromes

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Jaureguiberry; Sandra Díaz (2023). Dataset: A three-dimensional approach to general plant fire syndromes [Dataset]. http://doi.org/10.5061/dryad.j6q573njb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 27, 2023
    Dataset provided by
    Instituto Multidisciplinario de Biología Vegetal (CONICET-Universidad Nacional de Córdoba) and FCEFyN
    Authors
    Pedro Jaureguiberry; Sandra Díaz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Plant fire syndromes are usually defined as combinations of fire response traits, the most common being resprouting (R) and seeding (S). Plant flammability (F), on the other hand, refers to a plant’s effects on communities and ecosystems. Despite its important ecological and evolutionary implications, F has rarely been considered to define plant fire syndromes and, if so, usually separated from response syndromes.
    2. We propose a three-dimensional model that combines R, S and F, encapsulating both plant response to fire regimes and the capacity to promote them. Each axis is divided into three possible standardized categories, reflecting low, medium and high values of each variable, with a total of 27 possible combinations of R, S and F.
    3. We hypothesized that different fire histories should be reflected in the position of species within the three-dimensional space and that this should help assess the importance of fire as an evolutionary force in determining R-S-F syndromes.
    4. To illustrate our approach we compiled information on the fire syndromes of 24 dominant species of different growth forms from the Chaco seasonally-dry forest of central Argentina, and we compared them to 33 species from different Mediterranean-type climate ecosystems (MTCEs) of the world.
    5. Chaco and MTCEs species differed in the range (seven syndromes vs. thirteen syndromes, respectively) and proportion of extreme syndromes (i.e. species with extreme values of R, S and/or F) representing 29% of species in the Chaco vs. 45% in the MTCEs.
    6. Additionally, we explored the patterns of R, S and F of 4032 species from seven regions with contrasting fire histories, and found significantly higher frequencies of extreme values (predominantly high) of all three variables in MTCEs compared to the other regions, where intermediate and low values predominated, broadly supporting our general hypothesis.
    7. The proposed three-dimensional approach should help standardize comparisons of fire syndromes across taxa, growth forms and regions with different fire histories. This will contribute to the understanding of the role of fire in the evolution of plant traits and assist vegetation modelling in the face of changes in fire regimes. Methods Data collection for Chaco species From previous studies, we compiled data on post-fire resprouting (R) (Jaureguiberry 2012; Jaureguibery et al. 2020), germination capacity after heat shock treatments (S) (Jaureguiberry & Díaz) and flammability (F) (Jaureguiberry et al. 2011) of 24 dominant species of the seasonally-dry Chaco forest of central Argentina (hereafter Chaco). We then transformed the original data from the mentioned studies into three possible categorical ordinal values: 1, 2 or 3, indicating low, medium and high values of each variable, respectively. To do so, we used the following criteria: 1) For R data: we focused on the survival percentage recorded for each species (Jaureguiberry et al., 2020) as a proxy for resprouting capacity (Pérez-Harguindeguy et al., 2013). This was because this variable is widely used in fire studies and has a standard scale and range of values, therefore facilitating comparisons between species from different regions. Survival percentages were assigned to one of three possible intervals: 0 to 33 %; 34 to 66 % and from 67 to 100%, and then each interval was assigned the value 1, 2 or 3 respectively, indicating low, medium and high values of resprouting capacity. 2) For S data: based on germination response to heat shock treatments we classified species as heat-sensitive (germination lower than the control), heat-tolerant (germination similar to the control) or heat-stimulated (germination higher than the control) (see details in Jaureguiberry and Díaz 2015). Each of these categories was respectively assigned a value of 1, 2 or 3. 3) For F data: while original measurements included burning rate, maximum temperature and biomass consumed (see details in Jaureguiberry et al. 2011), with the purpose of comparing Chaco species with species from other regions, and considering that burning rate is rarely reported, data of the two latter variables were collected from studies that followed Jaureguiberry et al. (2011). A PCA followed by cluster analysis allowed classifying species into the following categories: 1=low flammability; 2=moderate flammability; and 3=high flammability.

    Data collection for other regions We performed an unstructured literature review of fire-related traits relevant to our model. Whenever possible, we searched for the same or similar variables to those used for the Chaco, namely survival percentage, germination response to heat shock, and variables related to flammability (e.g. maximum temperature, biomass consumed and burning rate), as proxies for R, S and F, respectively. Classification into different R intervals was based either on quantitative data on survival percentage, or on qualitative information from major databases. For example, resprouting capacity reported as “low”, or “high” (e.g. Tavşanoğlu & Pausas, 2018) were assigned R values of 1 and 3, respectively. For Southern Australian species, those reported as “fire killed” and “weak resprouting” (Falster et al., 2021) were assigned a value of 1, while those reported as “intermediate resprouting” and “strong resprouting” were assigned values of 2 and 3, respectively. The vast majority of records in our dataset refer to resprouting of individuals one growing season after the fire. Flammability data for most of the species were based on quantitative measurements that have used the method of Jaureguiberry et al. (2011), which was standardised following the criteria explained earlier. However, for some species, classification was based either on other quantitative measures that followed other methodologies (e.g. measures based on plant parts such as twigs or leaves, or fuel beds) or on qualitative classifications reported in the literature (most of which are in turn based on reviews of quantitative measurements from previous studies). We standardised the original data collected for the other regions following the same approach as for the Chaco. We then built contingency tables to analyse each region and to compare between regions. The curated total number of records from our literature review was 4411 (records for R, S and F, were 3399, 678 and 334, respectively) for 4,032 species (many species had information on two variables, and very few on the three variables). The database covers a wide taxonomic range, encompassing species from approximately 1,250 genera and 180 botanical families, belonging to ten different growth forms, and coming from seven major regions with a wide range of evolutionary histories of fire, from long and intense (Mediterranean-Type Climate Ecosystems) to very recent (New Zealand).

  14. Data from: Standardized O2 concentrations and interfacial fluxes at the...

    • doi.pangaea.de
    html, tsv
    Updated Feb 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fanny Noisette; Catriona L Hurd (2018). Standardized O2 concentrations and interfacial fluxes at the surface of the blade in the different experimental conditions of pH, flow, presence/absence of bryozoans and in saturated light and dark conditions (table 3) [Dataset]. http://doi.org/10.1594/PANGAEA.885869
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Feb 2, 2018
    Dataset provided by
    PANGAEA
    Authors
    Fanny Noisette; Catriona L Hurd
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Oct 19, 2015 - Oct 29, 2015
    Area covered
    Variables measured
    Type, Species, Ammonium, Salinity, Phosphate, Sample ID, Treatment, Carbonate ion, Carbon dioxide, Bicarbonate ion, and 22 more
    Description

    In order to allow full comparability with other ocean acidification data sets, the R package seacarb (Gattuso et al, 2016) was used to compute a complete and consistent set of carbonate system variables, as described by Nisumaa et al. (2010). In this dataset the original values were archived in addition with the recalculated parameters (see related PI). The date of carbonate chemistry calculation by seacarb is 2018-02-02.

  15. D

    Replication Data for: A Standardized Method for the Construction of Tracer...

    • dataverse.nl
    bin
    Updated Nov 17, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D. Vállez Garcia; C. Casteels; A. Schwarz; R. Dierckx; M. Koole; J. Doorduin; D. Vállez Garcia; C. Casteels; A. Schwarz; R. Dierckx; M. Koole; J. Doorduin (2017). Replication Data for: A Standardized Method for the Construction of Tracer Specific PET and SPECT Rat Brain Templates: Validation and Implementation of a Toolbox [Dataset]. http://doi.org/10.34894/QZTMV4
    Explore at:
    bin(13125152)Available download formats
    Dataset updated
    Nov 17, 2017
    Dataset provided by
    DataverseNL
    Authors
    D. Vállez Garcia; C. Casteels; A. Schwarz; R. Dierckx; M. Koole; J. Doorduin; D. Vállez Garcia; C. Casteels; A. Schwarz; R. Dierckx; M. Koole; J. Doorduin
    License

    https://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/QZTMV4https://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/QZTMV4

    Description

    Data set used in "A standardized method for the construction of tracer specific PET and SPECT rat brain templates: validation and implementation of a toolbox"

  16. f

    Correlations coefficients (r) for six horticultural traits and nine...

    • datasetcatalog.nlm.nih.gov
    Updated Jul 16, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    van Eeuwijk, Fred A.; van Bueren, Edith T. Lammerts; Myers, James R.; Paulo, Maria João; Zhu, Ning; Renaud, Erica N. C.; Juvik, John A. (2014). Correlations coefficients (r) for six horticultural traits and nine phytochemicals, calculated using data standardized across trials. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001170148
    Explore at:
    Dataset updated
    Jul 16, 2014
    Authors
    van Eeuwijk, Fred A.; van Bueren, Edith T. Lammerts; Myers, James R.; Paulo, Maria João; Zhu, Ning; Renaud, Erica N. C.; Juvik, John A.
    Description

    Correlation results include means from 23 cultivars, across eight pair combinations of location (Maine/Oregon), season (Fall/Spring) and management system (Conventional/Organic), 2006–2008a.aFor empty cells, r is not significantly different from zero (P<0.05).

  17. f

    Data from: cvs data file of Length-standardized surface area index and...

    • datasetcatalog.nlm.nih.gov
    Updated Jan 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sato, Katsufumi; Akiyama, Yu; Ramp, Christian; Swift, René; Hall, Ailsa; López, Lucía Martina Martín; Narazaki, Tomoko; Aoki, Kagari; Iwata, Takashi; Pomeroy, Patrick; Kershaw, Joanna; Miller, Patrick J. O.; Bellot, Charlotte; Wensveen, Paul J.; Biuw, Martin; Isojunno, Saana (2021). cvs data file of Length-standardized surface area index and tissue body density for R script from Aerial photogrammetry and tag-derived tissue density reveal patterns of lipid-store body condition of humpback whales on their feeding grounds [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000743536
    Explore at:
    Dataset updated
    Jan 17, 2021
    Authors
    Sato, Katsufumi; Akiyama, Yu; Ramp, Christian; Swift, René; Hall, Ailsa; López, Lucía Martina Martín; Narazaki, Tomoko; Aoki, Kagari; Iwata, Takashi; Pomeroy, Patrick; Kershaw, Joanna; Miller, Patrick J. O.; Bellot, Charlotte; Wensveen, Paul J.; Biuw, Martin; Isojunno, Saana
    Description

    See electronic supplementary materials for details

  18. e

    Standardized NEON organismal data (neonDivData)

    • portal.edirepository.org
    bin, csv
    Updated Apr 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske (2022). Standardized NEON organismal data (neonDivData) [Dataset]. http://doi.org/10.6073/pasta/c28dd4f6e7989003505ea02e9a92afbf
    Explore at:
    csv(67793652 bytes), csv(266884330 bytes), csv(4643854 bytes), csv(12011 bytes), csv(944312 bytes), csv(6879 bytes), csv(25181268 bytes), csv(1949590 bytes), csv(375200 bytes), csv(3062147 bytes), csv(35160044 bytes), csv(738408 bytes), csv(18427828 bytes), csv(604110 bytes), csv(35684117 bytes), csv(86101256 bytes), bin(20729 bytes), bin(4674 bytes)Available download formats
    Dataset updated
    Apr 12, 2022
    Dataset provided by
    EDI
    Authors
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske
    Time period covered
    Jun 5, 2013 - Jul 28, 2020
    Area covered
    Variables measured
    sex, unit, year, State, endRH, month, sites, units, value, boutID, and 113 more
    Description

    To standardize NEON organismal data for major taxonomic groups, we first systematically reviewed NEON’s documentations for each taxonomic group. We then discussed as a group and with NEON staff to decide how to wrangle and standardize NEON organismal data. See Li et al. 2022 for more details. All R code to process NEON data products can be obtained through the R package ‘ecocomDP’. Once the data are in ecocomDP format, we further processed them to convert them into long data frames with code on Github (https://github.com/daijiang/neonDivData/tree/master/data-raw), which is also archived here.

  19. Data applied to automatic method to transform routine otolith images for a...

    • seanoe.org
    image/*
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault (2022). Data applied to automatic method to transform routine otolith images for a standardized otolith database using R [Dataset]. http://doi.org/10.17882/91023
    Explore at:
    image/*Available download formats
    Dataset updated
    2022
    Dataset provided by
    SEANOE
    Authors
    Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    fisheries management is generally based on age structure models. thus, fish ageing data are collected by experts who analyze and interpret calcified structures (scales, vertebrae, fin rays, otoliths, etc.) according to a visual process. the otolith, in the inner ear of the fish, is the most commonly used calcified structure because it is metabolically inert and historically one of the first proxies developed. it contains information throughout the whole life of the fish and provides age structure data for stock assessments of all commercial species. the traditional human reading method to determine age is very time-consuming. automated image analysis can be a low-cost alternative method, however, the first step is the transformation of routinely taken otolith images into standardized images within a database to apply machine learning techniques on the ageing data. otolith shape, resulting from the synthesis of genetic heritage and environmental effects, is a useful tool to identify stock units, therefore a database of standardized images could be used for this aim. using the routinely measured otolith data of plaice (pleuronectes platessa; linnaeus, 1758) and striped red mullet (mullus surmuletus; linnaeus, 1758) in the eastern english channel and north-east arctic cod (gadus morhua; linnaeus, 1758), a greyscale images matrix was generated from the raw images in different formats. contour detection was then applied to identify broken otoliths, the orientation of each otolith, and the number of otoliths per image. to finalize this standardization process, all images were resized and binarized. several mathematical morphology tools were developed from these new images to align and to orient the images, placing the otoliths in the same layout for each image. for this study, we used three databases from two different laboratories using three species (cod, plaice and striped red mullet). this method was approved to these three species and could be applied for others species for age determination and stock identification.

  20. EPAdata_MLS_paper1

    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). EPAdata_MLS_paper1 [Dataset]. https://catalog.data.gov/dataset/epadata-mls-paper1
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    EPA Draft Method C QPCR cycle threshold (Ct) measurements of standardized reference materials as described in D-EMMD-MEB-025-QAPP-01 and Journal article. This dataset is associated with the following publication: Sivaganesan, M., T. Aw, S. Briggs, E. Dreelin, A. Aslan, S. Dorevitch, A. Shrestha, N. Isaacs, J. Kinzelman, G. Kleinheinz, R. Noble, R. Rediske, B. Scull, S. Rosenberg, B. Weberman, T. Sivy, B. Southwell, S. Siefring, K. Oshima, and R. Haugland. Standardized data quality acceptance criteria for a rapid Escherichia coli qPCR method (Draft Method C) for water quality monitoring at recreational beaches. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 156: 456-464, (2019).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Riley Kiefer (2023). SMDG, A Standardized Fundus Glaucoma Dataset [Dataset]. https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset/code
Organization logo

SMDG, A Standardized Fundus Glaucoma Dataset

Standardized Multi-Channel Dataset for Glaucoma of 19 public datasets (SMDG-19)

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
zip(3144020550 bytes)Available download formats
Dataset updated
Apr 23, 2023
Authors
Riley Kiefer
Description

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19), a standardization of 19 public glaucoma datasets for AI applications.

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is designed to be exploratory and open-ended with multiple use cases and no established training/validation/test cases. This dataset is the largest public repository of fundus images with glaucoma.

Citation

Please cite at least the first work in academic publications: 1. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 2. R. Kiefer, M. Abid, M. R. Ardali, J. Steen and E. Amjadian, "Automated Fundus Image Standardization Using a Dynamic Global Foreground Threshold Algorithm," 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 2023, pp. 460-465, doi: 10.1109/ICIVC58118.2023.10270429. 3. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 4. R. Kiefer, J. Steen, M. Abid, M. R. Ardali and E. Amjadian, "A Survey of Glaucoma Detection Algorithms using Fundus and OCT Images," 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2022, pp. 0191-0196, doi: 10.1109/IEMCON56893.2022.9946629.

Please also see the following optometry abstract publications: 1. A Comprehensive Survey of Publicly Available Glaucoma Datasets for Automated Glaucoma Detection; AAO 2022; https://aaopt.org/past-meeting-abstract-archives/?SortBy=ArticleYear&ArticleType=&ArticleYear=2022&Title=&Abstract=&Authors=&Affiliation=&PROGRAMNUMBER=225129 2. Standardized and Open-Access Glaucoma Dataset for Artificial Intelligence Applications; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2790420 3. Ground truth validation of publicly available datasets utilized in artificial intelligence models for glaucoma detection; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2791017

Please also see the DOI citations for this and related datasets: 1. SMDG; @dataset{smdg, title={SMDG, A Standardized Fundus Glaucoma Dataset}, url={https://www.kaggle.com/ds/2329670}, DOI={10.34740/KAGGLE/DS/2329670}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 2. EyePACS-light-v1 @dataset{eyepacs-light-v1, title={Glaucoma Dataset: EyePACS AIROGS - Light}, url={https://www.kaggle.com/ds/3222646}, DOI={10.34740/KAGGLE/DS/3222646}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 3. EyePACS-light-v2 @dataset{eyepacs-light-v2, title={Glaucoma Dataset: EyePACS-AIROGS-light-V2}, url={https://www.kaggle.com/dsv/7300206}, DOI={10.34740/KAGGLE/DSV/7300206}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} }

Dataset Objective

The objective of this dataset is a machine learning-ready dataset for glaucoma-related applications. Using the help of the community, new open-source glaucoma datasets will be reviewed for standardization and inclusion in this dataset.

Data Standardization

  • Full fundus images (and corresponding segmentation maps) are standardized using a novel algorithm (Citation 1) by cropping the background, centering the fundus image, padding missing information, and resizing to 512x512 pixels. This standardization ensures that the most amount of foreground information is prevalent during the resizing process for machine-learning-ready image processing.
  • Each available metadata text is standardized by provided each fundus image as a row and each fundus attribute as a column in a CSV file
Dataset InstanceOriginal FundusStandardized Fundus Image
sjchoi86-HRFhttps://user-images.githubusercontent.com/65875562/204170005-2d4dd051-0032-40c8-ba0b-390b6080bb69.png">https://user-images.githubusercontent.com/65875562/204170011-51b7d001-4d43-4f0d-835e-984d45116b18.png">
BEHhttps://user-images.githubusercontent.com/65875562/211052753-93f8a3aa-cc65-4790-8da6-229f512a6afb.PNG"><img src="htt...
Search
Clear search
Close search
Google apps
Main menu