20 datasets found

f
Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
zip
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
Z
Example subjects for Mobilise-D data standardization
data.niaid.nih.gov
Updated Oct 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soltani, Abolfazl (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
Explore at:
Dataset updated
Oct 11, 2022
Dataset provided by
Cereatti, Andrea
Kluge, Felix
Salis, Francesca
Hansen, Clint
D'Ascanio, Ilaria
Mazzà, Claudia
Bertuletti, Stefano
Palmerini, Luca
on behalf of the Mobilise-D consortium
Soltani, Abolfazl
Ullrich, Martin
Rochester, Lynn
Caruso, Marco
Paraschiv-Ionescu, Anisoara
Bonci, Tecla
Micó-Amigo, Encarna
Reggi, Luca
Küderle, Arne
Hiden, Hugo
Chiari, Lorenzo
Gazit, Eran
Del Din, Silvia
Kirk, Cameron
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
q
REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19
qubeshub.org
Updated Aug 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Joyner (2019). REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19 [Dataset]. http://doi.org/10.25334/M13H-XT81
Explore at:
Unique identifier
https://doi.org/10.25334/M13H-XT81
Dataset updated
Aug 28, 2019
Dataset provided by
QUBES
Authors
Jessica Joyner
Description
Video on normalizing microbiome data from the Research Experiences in Microbiomes Network
Dataset supporting: Normalizing and denoising protein expression data from...
nih.figshare.com
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew P. Mulé; Andrew J. Martins; John Tsang (2023). Dataset supporting: Normalizing and denoising protein expression data from droplet-based single cell profiling [Dataset]. http://doi.org/10.35092/yhjc.13370915.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc.13370915.v2
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Matthew P. Mulé; Andrew J. Martins; John Tsang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data for reproducing analysis in the manuscript:Normalizing and denoising protein expression data from droplet-based single cell profilinglink to manuscript: https://www.biorxiv.org/content/10.1101/2020.02.24.963603v1

Data deposited here are for the purposes of reproducing the analysis results and figures reported in the manuscript above. These data are all publicly available downloaded and converted to R datasets prior to Dec 4, 2020. For a full description of all the data included in this repository and instructions for reproducing all analysis results and figures, please see the repository: https://github.com/niaid/dsb_manuscript.

For usage of the dsb R package for normalizing CITE-seq data please see the repository: https://github.com/niaid/dsb

If you use the dsb R package in your work please cite:Mulè MP, Martins AJ, Tsang JS. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv. 2020;2020.02.24.963603.

General contact: John Tsang (john.tsang AT nih.gov)

Questions about software/code: Matt Mulè (mulemp AT nih.gov)
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
n
Methods for normalizing microbiome data: an ecological perspective
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 30, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tn8qs35
Dataset updated
Oct 30, 2018
Dataset provided by
University of New England
James Cook University
Authors
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
e
Standardized NEON organismal data (neonDivData)
portal.edirepository.org
bin, csv
Updated Apr 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske (2022). Standardized NEON organismal data (neonDivData) [Dataset]. http://doi.org/10.6073/pasta/c28dd4f6e7989003505ea02e9a92afbf
Explore at:
csv(67793652 bytes), csv(266884330 bytes), csv(4643854 bytes), csv(12011 bytes), csv(944312 bytes), csv(6879 bytes), csv(25181268 bytes), csv(1949590 bytes), csv(375200 bytes), csv(3062147 bytes), csv(35160044 bytes), csv(738408 bytes), csv(18427828 bytes), csv(604110 bytes), csv(35684117 bytes), csv(86101256 bytes), bin(20729 bytes), bin(4674 bytes)Available download formats
Unique identifier
https://doi.org/10.6073/pasta/c28dd4f6e7989003505ea02e9a92afbf
Dataset updated
Apr 12, 2022
Dataset provided by
EDI
Authors
Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske
Time period covered
Jun 5, 2013 - Jul 28, 2020
Area covered

Variables measured
sex, unit, year, State, endRH, month, sites, units, value, boutID, and 113 more
Description
To standardize NEON organismal data for major taxonomic groups, we first systematically reviewed NEON’s documentations for each taxonomic group. We then discussed as a group and with NEON staff to decide how to wrangle and standardize NEON organismal data. See Li et al. 2022 for more details. All R code to process NEON data products can be obtained through the R package ‘ecocomDP’. Once the data are in ecocomDP format, we further processed them to convert them into long data frames with code on Github (https://github.com/daijiang/neonDivData/tree/master/data-raw), which is also archived here.
The global spectrum of plant form and function dataset: taxonomic...
zenodo.org
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roeland Kindt; Roeland Kindt (2025). The global spectrum of plant form and function dataset: taxonomic standardization of 45,955 taxa to World Flora Online version 2023.12 [Dataset]. http://doi.org/10.5281/zenodo.15563432
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15563432
Dataset updated
May 31, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roeland Kindt; Roeland Kindt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The global spectrum of plant form and function dataset (Diaz et al. 2022; Diaz et al. 2016; TRY 2022, accessed 15-May-2025) provides mean trait values for (i) plant height; (ii) stem specific density; (iii) leaf area; (iv) leaf mass per area; (v) leaf nitrogen content per dry mass; and (vi) diaspore (seed or spore) mass for 46,047 taxa.

Here I provide a dataset where the taxa covered by that database were standardized to World Flora Online (Borsch et al. 2020; taxonomic backbone version 2023.12) by matching names with those in the Agroforestry Species Switchboard (Kindt et al. 2025; version 4). Taxa for which no matches could be found were standardized with the WorldFlora package (Kindt 2020), using similar R scripts and the same taxonomic backbone data as those used to standardize species names for the Switchboard. Where still no matches could be found, taxa were matched with those matched previously with a harmonized data set for TRY 6.0 (Kindt 2024).

References

Díaz, S., Kattge, J., Cornelissen, J.H.C. et al. The global spectrum of plant form and function: enhanced species-level trait dataset. Sci Data 9, 755 (2022). https://doi.org/10.1038/s41597-022-01774-9

Díaz, S., Kattge, J., Cornelissen, J. et al. The global spectrum of plant form and function. Nature 529, 167–171 (2016). https://doi.org/10.1038

TRY. 2022. The global spectrum of plant form and function dataset. https://www.try-db.org/TryWeb/Data.php#81

Borsch, T., Berendsohn, W., Dalcin, E., Delmas, M., Demissew, S., Elliott, A., Fritsch, P., Fuchs, A., Geltman, D., Güner, A., Haevermans, T., Knapp, S., le Roux, M.M., Loizeau, P.-A., Miller, C., Miller, J., Miller, J.T., Palese, R., Paton, A., Parnell, J., Pendry, C., Qin, H.-N., Sosa, V., Sosef, M., von Raab-Straube, E., Ranwashe, F., Raz, L., Salimov, R., Smets, E., Thiers, B., Thomas, W., Tulig, M., Ulate, W., Ung, V., Watson, M., Jackson, P.W. and Zamora, N. (2020), World Flora Online: Placing taxonomists at the heart of a definitive and comprehensive global resource on the world's plants. TAXON, 69: 1311-1341. https://doi.org/10.1002/tax.12373

Roeland Kindt, Ilyas Siddique, Ian Dawson, Innocent John, Fabio Pedercini, Jens-Peter B. Lillesø, Lars Graudal. 2025. The Agroforestry Species Switchboard, a global resource to explore information for 107,269 plant species. bioRxiv 2025.03.09.642182; doi: https://doi.org/10.1101/2025.03.09.642182

Kindt, R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online taxonomic backbone data. Applications in Plant Sciences 8(9): e11388. https://doi.org/10.1002/aps3.11388

Kindt, R. (2024). TRY 6.0 - Species List from Taxonomic Harmonization – Matches with World Flora Online version 2023.12 (2024.10b) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13906338

Funding

The development of this dataset was supported by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project.
CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007665
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Thilde Terkelsen; Anders Krogh; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.
Z
Data from: Rapid Creation of a Data Product for the World's Specimens of...
data.niaid.nih.gov
zenodo.org
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pearson, Katelin D. (2024). Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3974999
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
Pearson, Katelin D.
Soltis, Pam
Upham, Nathan
Mast, Austin R.
Sherman, Aja
Bruhn, Robert
Krimmel, Erica R.
Rios, Nelson
Paul, Deborah L.
Shorthouse, David P.
Dalton, Trevor
Simmons, Nancy B.
Abibou, Djihbrihou
License
https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/
Description
This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

Files included in this resource

9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format

0067804-200613084148143.zip: Raw data from GBIF, DwC-A format

0067806-200613084148143.zip: Raw data from GBIF, DwC-A format

1623690110.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format

bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.

bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.

flagEventDate.txt: file containing term definition to reference in DwC-A

flagExclude.txt: file containing term definition to reference in DwC-A

flagGeoreference.txt: file containing term definition to reference in DwC-A

flagTaxonomy.txt: file containing term definition to reference in DwC-A

georeferencedByID.txt: file containing term definition to reference in DwC-A

identifiedByNames.txt: file containing term definition to reference in DwC-A

instructions-to-get-people-data-from-bionomia-via-datasetKey: instructions given to data providers

RAPID-code_collection-date.R: code associated with enhancing collection dates

RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data

RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages

RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages

RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages

RAPID-code_people.R: code associated with enhancing data about people

RAPID-code_standardize-country.R: code associated with standardizing country data

RAPID-data-dictionary.pdf: metadata about terms included in this project’s data, in PDF format

RAPID-data-dictionary.xlsx: metadata about terms included in this project’s data, in spreadsheet format

rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv

rapid-final-data-product_2021-06-29.zip: Enhanced data from BIOSPEX, DwC-A format

rapid-final-gazetteer.zip: Gazetteer providing georeference data and metadata for 10,341 localities assessed as part of this project

rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized

RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates

RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data

RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages

RAPID-protocol_georeference.pdf: protocol associated with georeferencing

RAPID-protocol_people.pdf: protocol associated with enhancing data about people

RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data

RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data

RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

recordedByNames.txt: file containing term definition to reference in DwC-A

Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

wikidata-notes-for-bat-collectors_leachman_2020: please see https://zenodo.org/record/4724139 for this resource
Data for: Patel et al. Carbon flux estimates are sensitive to data source: A...
osti.gov
knb.ecoinformatics.org
+2more
Updated Jan 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) (United States) (2022). Data for: Patel et al. Carbon flux estimates are sensitive to data source: A comparison of field and lab temperature sensitivity data [Dataset]. http://doi.org/10.15485/1889750
Explore at:
Unique identifier
https://doi.org/10.15485/1889750
Dataset updated
Jan 1, 2022
Dataset provided by
Department of Energy Biological and Environmental Research Program
Office of Sciencehttp://www.er.doe.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) (United States)
Soil Carbon Biogeochemistry
Description
This dataset contains data and code used for the paper "Carbon flux estimates are sensitive to data source: A comparison of field and lab temperature sensitivity data" [DOI COMING SOON]A large literature exists on mechanisms driving soil production of the greenhouse gases CO2 and CH4. Measurements of these gases’ fluxes are often performed using closed-chamber incubations in the laboratory or in situ, i.e., in the field. Although it is common knowledge that measurements obtained through field studies vs. laboratory incubations can diverge because of the vastly different conditions of these environments, few studies have systematically examined these patterns. It is crucial to understand the magnitude and reasons for any differences, as these data are used to parametrize and benchmark ecosystem- to global-scale models, which are then susceptible to the biases of the source data. Here, we specifically examine how greenhouse gas measurements may be influenced by whether the measurement/incubation was conducted in the field vs. laboratory, focusing on CO2 and CH4 measurements. We use Q10 of greenhouse gas flux (temperature sensitivity) for our analyses, because of the ubiquity of this metric in biological and Earth system sciences and its importance to many modeling frameworks. We predicted that laboratory measurements would be less variable, but also less representative of true field conditions. However, there was greater variability in the Q10 values calculated from lab-based measurements of CO2 fluxes, because lab experiments explore extremes rarely seen in situ, and reflect the physical and chemical disturbances occurring during sampling, transport, and incubation. Overall, respiration Q10 values were significantly greater in laboratory incubations (mean = 4.19) than field measurements (mean = 3.05), with strong influences of incubation temperature and climate region/biome. However, this was in part because field measurements typically represent total respiration (Rs), whereas lab incubations typically represent heterotrophic respiration (Rh), making direct comparisons difficult to interpret. Focusing only on Rh-derived Q10, these values showed almost identical distributions across laboratory (n = 1110) and field (n = 581) experiments, providing strong support for using the former as an experimental proxy for the latter, although we caution that geographic biases in the extant data make this conclusion tentative. Due to a smaller sample size of CH4 Q10 data, we were unable to perform a comparable robust analysis, but we expect similar interactions with soil temperature, moisture, and environmental/climatic variables. Our results here suggest the need for more concerted efforts to document and standardize these data, including sample and site metadata. This dataset contains a compressed (.zip) archive of the data and R scripts used for this manuscript. The dataset includes files in .csv format, which can be accessed and processed using MS Excel or R. This archive can also be accessed on GitHub at https://github.com/kaizadp/field_lab_q10 (DOI: 10.5281/zenodo.7106554).
Water Rights Demand Analysis Methodology Datasets
catalog.data.gov
data.cnra.ca.gov
+1more
Updated Mar 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2024). Water Rights Demand Analysis Methodology Datasets [Dataset]. https://catalog.data.gov/dataset/water-rights-demand-analysis-methodology-datasets-92ed2
Explore at:
Dataset updated
Mar 30, 2024
Dataset provided by
California State Water Resources Control Board
Description
The following datasets are used for the Water Rights Demand Analysis project and are formatted to be used in the calculations. The State Water Resources Control Board Division of Water Rights (Division) has developed a methodology to standardize and improve the accuracy of water diversion and use data that is used to determine water availability and inform water management and regulatory decisions. The Water Rights Demand Data Analysis Methodology (Methodology https://www.waterboards.ca.gov/drought/drought_tools_methods/demandanalysis.html ) is a series of data pre-processing steps, R Scripts, and data processing modules that identify and help address data quality issues related to both the self-reported water diversion and use data from water right holders or their agents and the Division of Water Rights electronic water rights data.
Naturalistic Neuroimaging Database
openneuro.org
Updated Apr 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v2.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v2.0.0
Dataset updated
Apr 20, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview

We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts) # Install / load package to make fake fMRI ts # install.packages("neuRosim") library(neuRosim) # Generate a ts ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1) # 3dDetrend -normalize # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1" # Do for the full timeseries ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2)); # Do this again for a shorter version of the same timeseries ts.shorter.length <- length(ts.normalised.long)/4 ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2)); # By looking at the summaries, it can be seen that the median values become larger summary(ts.normalised.long) summary(ts.normalised.short) # Plot results for the long and short ts # Truncate the longer ts for plotting only ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length] # Give the plot a title title <- "3dDetrend -normalize for long (blue) and short (red) timeseries"; plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short))); # Add zero line lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey'); # 3dDetrend -normalize -polort 0 for long timeseries lines(ts.normalised.long.made.shorter, col='blue'); # 3dDetrend -normalize -polort 0 for short timeseries lines(ts.normalised.short, col='red');

Standardization/modernization

The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.

New afni_proc.py command line

The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

Effect on results

From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
Additional file 4: of DBNorm: normalizing high-density oligonucleotide...
springernature.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy (2023). Additional file 4: of DBNorm: normalizing high-density oligonucleotide microarray data based on distributions [Dataset]. http://doi.org/10.6084/m9.figshare.5648956.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5648956.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DBNorm installation. Describes how to install DBNorm via devtools in R. (TXT 4Â kb)
d
Pinyon jay (Gymnorhinus cyanocephalus) nest site selection in central New...
datadryad.org
zip
Updated Sep 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Novak; Loren Smith; Scott McMurry (2020). Pinyon jay (Gymnorhinus cyanocephalus) nest site selection in central New Mexico: Habitat Data to be used with AICc [Dataset]. http://doi.org/10.5061/dryad.ngf1vhhrd
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ngf1vhhrd
Dataset updated
Sep 22, 2020
Dataset provided by
Dryad
Authors
Michael Novak; Loren Smith; Scott McMurry
Time period covered
Jun 18, 2020
Description
Anyone looking to process these data should standardize it, then processes it using AICc in R.
f
Data_Sheet_1_Best Practice Data Standards for Discrete Chemical...
frontiersin.figshare.com
txt
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue (2023). Data_Sheet_1_Best Practice Data Standards for Discrete Chemical Oceanographic Observations.csv [Dataset]. http://doi.org/10.3389/fmars.2021.705638.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fmars.2021.705638.s001
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effective data management plays a key role in oceanographic research as cruise-based data, collected from different laboratories and expeditions, are commonly compiled to investigate regional to global oceanographic processes. Here we describe new and updated best practice data standards for discrete chemical oceanographic observations, specifically those dealing with column header abbreviations, quality control flags, missing value indicators, and standardized calculation of certain properties. These data standards have been developed with the goals of improving the current practices of the scientific community and promoting their international usage. These guidelines are intended to standardize data files for data sharing and submission into permanent archives. They will facilitate future quality control and synthesis efforts and lead to better data interpretation. In turn, this will promote research in ocean biogeochemistry, such as studies of carbon cycling and ocean acidification, on regional to global scales. These best practice standards are not mandatory. Agencies, institutes, universities, or research vessels can continue using different data standards if it is important for them to maintain historical consistency. However, it is hoped that they will be adopted as widely as possible to facilitate consistency and to achieve the goals stated above.
Additional file 3: of DBNorm: normalizing high-density oligonucleotide...
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy (2023). Additional file 3: of DBNorm: normalizing high-density oligonucleotide microarray data based on distributions [Dataset]. http://doi.org/10.6084/m9.figshare.5648932.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5648932.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DBNorm test script. Code of how we test DBNorm package. (TXT 2Â kb)
f
Data_Sheet_2_Best Practice Data Standards for Discrete Chemical...
frontiersin.figshare.com
docx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue (2023). Data_Sheet_2_Best Practice Data Standards for Discrete Chemical Oceanographic Observations.docx [Dataset]. http://doi.org/10.3389/fmars.2021.705638.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmars.2021.705638.s002
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effective data management plays a key role in oceanographic research as cruise-based data, collected from different laboratories and expeditions, are commonly compiled to investigate regional to global oceanographic processes. Here we describe new and updated best practice data standards for discrete chemical oceanographic observations, specifically those dealing with column header abbreviations, quality control flags, missing value indicators, and standardized calculation of certain properties. These data standards have been developed with the goals of improving the current practices of the scientific community and promoting their international usage. These guidelines are intended to standardize data files for data sharing and submission into permanent archives. They will facilitate future quality control and synthesis efforts and lead to better data interpretation. In turn, this will promote research in ocean biogeochemistry, such as studies of carbon cycling and ocean acidification, on regional to global scales. These best practice standards are not mandatory. Agencies, institutes, universities, or research vessels can continue using different data standards if it is important for them to maintain historical consistency. However, it is hoped that they will be adopted as widely as possible to facilitate consistency and to achieve the goals stated above.
Dataset: A three-dimensional approach to general plant fire syndromes
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Jaureguiberry; Pedro Jaureguiberry; Sandra Díaz; Sandra Díaz (2023). Dataset: A three-dimensional approach to general plant fire syndromes [Dataset]. http://doi.org/10.5061/dryad.j6q573njb
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j6q573njb
Dataset updated
Jan 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro Jaureguiberry; Pedro Jaureguiberry; Sandra Díaz; Sandra Díaz
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
1. Plant fire syndromes are usually defined as combinations of fire response traits, the most common being resprouting (R) and seeding (S). Plant flammability (F), on the other hand, refers to a plant's effects on communities and ecosystems. Despite its important ecological and evolutionary implications, F has rarely been considered to define plant fire syndromes and, if so, usually separated from response syndromes.

2. We propose a three-dimensional model that combines R, S and F, encapsulating both plant response to fire regimes and the capacity to promote them. Each axis is divided into three possible standardized categories, reflecting low, medium and high values of each variable, with a total of 27 possible combinations of R, S and F.

3. We hypothesized that different fire histories should be reflected in the position of species within the three-dimensional space and that this should help assess the importance of fire as an evolutionary force in determining R-S-F syndromes.

4. To illustrate our approach we compiled information on the fire syndromes of 24 dominant species of different growth forms from the Chaco seasonally-dry forest of central Argentina, and we compared them to 33 species from different Mediterranean-type climate ecosystems (MTCEs) of the world.

5. Chaco and MTCEs species differed in the range (seven syndromes vs. thirteen syndromes, respectively) and proportion of extreme syndromes (i.e. species with extreme values of R, S and/or F) representing 29% of species in the Chaco vs. 45% in the MTCEs.

6. Additionally, we explored the patterns of R, S and F of 4032 species from seven regions with contrasting fire histories, and found significantly higher frequencies of extreme values (predominantly high) of all three variables in MTCEs compared to the other regions, where intermediate and low values predominated, broadly supporting our general hypothesis.

7. The proposed three-dimensional approach should help standardize comparisons of fire syndromes across taxa, growth forms and regions with different fire histories. This will contribute to the understanding of the role of fire in the evolution of plant traits and assist vegetation modelling in the face of changes in fire regimes.
f
Data repository.
plos.figshare.com
xlsx
Updated Mar 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kayleigh R. Cook; Zebenay B. Zeleke; Ephrem Gebrehana; Daniel Burssa; Bantalem Yeshanew; Atkilt Michael; Yoseph Tediso; Taylor Jaraczewski; Chris Dodgion; Andualem Beyene; Katherine R. Iverson (2024). Data repository. [Dataset]. http://doi.org/10.1371/journal.pgph.0002600.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002600.s002
Dataset updated
Mar 27, 2024
Dataset provided by
PLOS Global Public Health
Authors
Kayleigh R. Cook; Zebenay B. Zeleke; Ephrem Gebrehana; Daniel Burssa; Bantalem Yeshanew; Atkilt Michael; Yoseph Tediso; Taylor Jaraczewski; Chris Dodgion; Andualem Beyene; Katherine R. Iverson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In 2015, the Ethiopian Federal Ministry of Health (FMOH) developed the Saving Lives through Safe Surgery (SaLTS) initiative to improve national surgical care. Previous work led to development and implementation of 15 surgical key performance indicators (KPIs) to standardize surgical data practices. The objective of this project is to investigate current practices of KPI data collection and assess quality to improve data management and strengthen surgical systems. The first portion of the study documented the surgical data collection process including methods, instruments, and effectiveness at 10 hospitals across 2 regions in Ethiopia. Secondly, data for KPIs of focus [1. Surgical Volume, 2. Perioperative Mortality Rate (POMR), 3. Adverse Anesthetic Outcome (AAO), 4. Surgical Site Infection (SSI), and 5. Safe Surgery Checklist (SSC) Utilization] were compared between registries, KPI reporting forms, and the DHIS2 (district health information system) electronic database for a 6-month period (January—June 2022). Quality was assessed based on data completeness and consistency. The data collection process involved hospital staff recording data elements in registries, quality officers calculating KPIs, completing monthly KPI reporting forms, and submitting data into DHIS2 for the national and regional health bureaus. Data quality verifications revealed discrepancies in consistency at all hospitals, ranging from 1–3 indicators. For all hospitals, average monthly surgical volume was 57 cases, POMR was 0.38% (13/3399), inpatient SSI rate was 0.79% (27/3399), AAO rate was 0.15% (5/3399), and mean SSC utilization monthly was 93% (100% median). Half of the hospitals had incomplete data within the registries, ranging from 2–5 indicators. AAO, SSC, and SSI were commonly missing data in registries. Non-standardized KPI reporting forms contributed significantly to the findings. Facilitators to quality data collection included continued use of registries from previous interventions and use of a separate logbook to document specific KPIs. Delayed rollout of these indicators in each region contributed to issues in data quality. Barriers involved variable indicator recording from different personnel, data collection tools that generate false positives (i.e. completeness of SSC defined as paper form filled out prior to patient discharge) or missing data because of reporting time period (i.e. monthly SSI may miss infections outside of one month), inadequate data elements in registries, and lack of standardized monthly KPI reporting forms. As the FMOH introduces new indicators and changes, we recommend continuous and consistent quality checks and data capacity building, including the use of routinely generated health information for quality improvement projects at the department level.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.3389/fgene.2019.00400.s002

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Clear search

Close search

Google apps

Main menu

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

Example subjects for Mobilise-D data standardization

REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19

Dataset supporting: Normalizing and denoising protein expression data from...

Simulation Data Set

Methods for normalizing microbiome data: an ecological perspective

Standardized NEON organismal data (neonDivData)

The global spectrum of plant form and function dataset: taxonomic...

CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

Data from: Rapid Creation of a Data Product for the World's Specimens of...

Data for: Patel et al. Carbon flux estimates are sensitive to data source: A...

Water Rights Demand Analysis Methodology Datasets

Naturalistic Neuroimaging Database

Overview

v2.0 Changes

Additional file 4: of DBNorm: normalizing high-density oligonucleotide...

Pinyon jay (Gymnorhinus cyanocephalus) nest site selection in central New...

Data_Sheet_1_Best Practice Data Standards for Discrete Chemical...

Additional file 3: of DBNorm: normalizing high-density oligonucleotide...

Data_Sheet_2_Best Practice Data Standards for Discrete Chemical...

Dataset: A three-dimensional approach to general plant fire syndromes

Data repository.

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zipSee More Versions

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip