10 datasets found

f
Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
zip
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
n
Methods for normalizing microbiome data: an ecological perspective
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 30, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tn8qs35
Dataset updated
Oct 30, 2018
Dataset provided by
James Cook University
University of New England
Authors
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
Naturalistic Neuroimaging Database
openneuro.org
Updated Apr 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v2.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v2.0.0
Dataset updated
Apr 20, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview

We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts) # Install / load package to make fake fMRI ts # install.packages("neuRosim") library(neuRosim) # Generate a ts ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1) # 3dDetrend -normalize # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1" # Do for the full timeseries ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2)); # Do this again for a shorter version of the same timeseries ts.shorter.length <- length(ts.normalised.long)/4 ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2)); # By looking at the summaries, it can be seen that the median values become larger summary(ts.normalised.long) summary(ts.normalised.short) # Plot results for the long and short ts # Truncate the longer ts for plotting only ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length] # Give the plot a title title <- "3dDetrend -normalize for long (blue) and short (red) timeseries"; plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short))); # Add zero line lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey'); # 3dDetrend -normalize -polort 0 for long timeseries lines(ts.normalised.long.made.shorter, col='blue'); # 3dDetrend -normalize -polort 0 for short timeseries lines(ts.normalised.short, col='red');

Standardization/modernization

The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.

New afni_proc.py command line

The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

Effect on results

From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
Z
Example subjects for Mobilise-D data standardization
data.niaid.nih.gov
Updated Oct 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soltani, Abolfazl (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
Explore at:
Dataset updated
Oct 11, 2022
Dataset provided by
Bertuletti, Stefano
Palmerini, Luca
D'Ascanio, Ilaria
Micó-Amigo, Encarna
Reggi, Luca
Soltani, Abolfazl
Mazzà, Claudia
Gazit, Eran
Caruso, Marco
on behalf of the Mobilise-D consortium
Bonci, Tecla
Salis, Francesca
Hansen, Clint
Cereatti, Andrea
Rochester, Lynn
Ullrich, Martin
Kluge, Felix
Paraschiv-Ionescu, Anisoara
Kirk, Cameron
Del Din, Silvia
Küderle, Arne
Hiden, Hugo
Chiari, Lorenzo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007665
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Thilde Terkelsen; Anders Krogh; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.
Dataset for: Navigating Ecosystem Services Trade-offs: A Global...
zenodo.org
bin, pdf
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Jose Martinez-Harms; Maria Jose Martinez-Harms; Barbara Larrain Barrios; Barbara Larrain Barrios (2024). Dataset for: Navigating Ecosystem Services Trade-offs: A Global Comprehensive Review [Dataset]. http://doi.org/10.5281/zenodo.13249080
Explore at:
pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13249080
Dataset updated
Aug 7, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Jose Martinez-Harms; Maria Jose Martinez-Harms; Barbara Larrain Barrios; Barbara Larrain Barrios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 2024
Description
Methods

The dataset is the output of a comprehensive literature-based search that aims to collate all the evidence on where ES relationships have been mentioned and addressed. We applied systematic mapping which is based on the “Guidelines for Systematic Review in Environmental Management” developed by the Centre for Evidence-Based Conservation at Bangor University (Pullin and Stewart 2006).

The methodological framework followed the standard stages outlined for systematic mapping in environmental sciences (James et al. 2016). Briefly, we defined the scope and objectives:

· We comprehensively review and further explore the global evidence of ES trade-offs and synergies focusing on all systems including terrestrial, freshwater, and marine.

· We compiled the evidence on trade-offs and synergies among multiple ES interacting across various ecosystems.

· We performed a geographical and temporal trend analysis exploring the distribution of studies across the world examining how the focus on various ecosystem types and ES categories has evolved to highlight gaps and biases.

Then we set the criteria for study inclusion (Table 1), searched the evidence, coded, and produced the database. Extracted article information including the specific criteria is detailed in Table 1.

The first step was to search the ISI Web of Knowledge core collection (http://apps.webofknowledge.com) database, targeting the search on the ecosystem services literature and studies dealing with trade-offs/synergies, win-win outcomes or bundles when managing different ecosystem services in the landscape/seascape. All peer-reviewed journal articles written in English and Spanish have been considered for review.

The peer-reviewed literature from 2005 to 2021 was reviewed identifying relevant studies according to specific search terms. The relevant search terms and descriptive words derived from (Howe et al. 2014) adding “bundles” and “co-benefits”. Boolean nomenclatures ‘*’ = all letters were allowed after the *, were used on the root of words where several different endings applied (Figure 1). Search terms used were:

(“*ecosystem service*” OR “environment* service*” OR “ecosystem* approach*” OR “ecosystem good*” OR “environment* good*”)

AND

(“*trade-off*” OR “tradeoff*” OR “synerg*” OR “win-win*” OR “bundle*” OR “cost*and benefit*” OR “co-benefit*”) n=5194

Papers were preliminarily coded with a semantic analysis using the R package Bibliometrix (http://www.bibliometrix.org).

In the second step (Figure 1) papers were preliminarily coded with a semantic analysis using the R package Bibliometrix (http://www.bibliometrix.org). Papers were classified according to three systems: terrestrial, marine, and freshwater (Table 1). Papers with multiple systems, transitional habitats or those that could not be classified were classified as “other” (Mazor et al. 2018). Articles were classified based on the occurrence of the most frequent system words in their title, keywords, and abstract (Mazor et al. 2018). The set of system-specific words was determined by extracting the 250 most frequently used keywords from all considered articles and assigning each word to either system (articles could fall into just one of the four categories). Using this technique, we managed to classify 100% of the papers. To further enrich the dataset and make it a useful repository for science and policy, an additional sub-classification was performed, categorizing papers into the following categories: Coastal, Urban, Wetlands, Forest, Mountain, Freshwater, Agroecosystems, and Others that mainly represented multiple ecosystems (Table S1). This comprehensive classification approach enhances the dataset’s utility for various scientific and policy-making applications.

In the third step (Figure 1), applying the same technique, we classified the papers into four ES categories: habitat (supporting biodiversity related), provisioning, regulating, and cultural services (De Groot et al. 2010; MEA 2005; Sukhdev 2010; Wallace 2007). For the classification into ES categories, articles could fall into one or more of the four categories (see Table 1 for example the keywords used to classify ecosystems, ES categories, and countries). Applying this technique, we excluded 2149 papers that weren’t classified in any of the ecosystem services types categories resulting in 3629 papers (see Figure 1).

In the fourth step (Figure 1), an initial screening was conducted to identify papers that did not align with the review objectives of assessing ecosystem services trade-offs and synergies to inform policy and management decisions. We manually reviewed the titles of each paper in the dataset, excluding those that were from other fields or did not align with the review objectives. In this initial assessment, we excluded 347 papers, leaving a total of 3,286 papers for further review. A descriptive analysis of this 3286 article dataset was performed to examine the distribution of ES categories within each ecosystem type over the specified period. This analysis allowed us to conclude the prevalence of each ecosystem service category in different ecosystem types, identifying temporal trends and patterns. The number of occurrences was calculated for each ES category within each ecosystem type, expressed as counts. This allowed for the comparison of ecosystem service distributions across the selected ecosystem types.

In the fifth step (Figure 1), we employed an approach to visually represent the geographical distribution and focus of ES studies across the world. With the classification of studies in ES categories and the types of ecosystems, the papers were coded according to the country where the study was performed. It was possible to assign a specific country to 2636 studies, removing 650 studies that did not specify the country of study. From these 2636 papers classified, a proportion were global studies that consider several countries under study (499 global studies).

We developed global maps (Figure 1), each offering a unique perspective on the ES research landscape. The first map presents the total number of ES trade-off studies conducted worldwide, illustrating the geographical spread and concentration of research efforts to provide a clear overview of regions that have been extensively studied and those that may require more attention in future research. Additionally, we calculated two key metrics to assess research productivity more comprehensively: the number of research papers per capita and the number of research papers relative to Gross Domestic Product (GDP). For population and GDP, we used the most recent available data from the World Bank (https://data.worldbank.org). These alternative metrics normalize the data based on economic output and population size, providing a more balanced view of research activity across different countries (Figures S3).

Detailed maps were created featuring pie charts that highlight the different categories of ES and ecosystem types addressed for each country. These charts offer an understanding of how various ES categories and ecosystems are represented in different parts of the world. Finally, we assessed ES trade-off studies to world regions (Africa, Antarctica, Asia, Australasia, Europe, Latin America, and North America) looking at the relationships between the categories of ES. We considered papers that evaluated more than one category of ES and the papers that considered only one category of ES. This country-level analysis offers insights into regional research trends and priorities, contributing to a more localized understanding of ES studies.

In the sixth step (Figure 1), each publication in this review was critically appraised to evaluate the quality of the papers included in the review. The foundation for our critical appraisal stems from the comprehensive and multidimensional approach of Belcher et al. (2016) that is framed to evaluate research quality, which aligns well with the interdisciplinary nature of our study. Belcher et al. (2016) developed a robust framework that incorporates essential principles and criteria for assessing the quality of transdisciplinary research. This is particularly relevant for ecosystem services science and our review that contributes to advancing current knowledge by systematically synthesizing evidence on relationships among various ES across these diverse systems.

The Belcher et al. (2016) framework emphasizes four main principles: relevance, credibility (which we have adapted as methodological transparency), legitimacy (generalizability in our context), and effectiveness (significance). A continuous scoring system (ranging from 0 to 1) was applied for the four main criteria to maintain simplicity and consistency across the large number of studies. In this system, a value closer to 0 indicates that the criteria are not met, while a value closer to 1 indicates that the criteria are more closely met. This scoring method was a useful indicator of the overall quality of the paper and how well the article met the review's goals overall.

Methodological Transparency was assessed based on the clarity and completeness of methodological descriptions, including data availability, the rigor of statistical analyses, methodological detail, and reproducibility of the findings. This criterion assesses the transparency and rigor of the study's methodology, including data collection, analysis, and reporting (Belcher et al. 2016). Relevance was evaluated by the study's alignment with the review's objectives, its importance to the field, and its practical applicability. This includes the extent to which the study addresses pertinent research
f
Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...
figshare.com
frontiersin.figshare.com
xlsx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00594.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
b
Underway Data (SAS) from R/V Roger Revelle KNOX22RR in the Patagonian Shelf...
bco-dmo.org
search.dataone.org
+1more
csv
Updated Jun 28, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William M. Balch (2010). Underway Data (SAS) from R/V Roger Revelle KNOX22RR in the Patagonian Shelf (SW South Atlantic) from 2008-2009 (COPAS08 project) [Dataset]. https://www.bco-dmo.org/dataset/3356
Explore at:
csv(1.56 MB)Available download formats
Dataset updated
Jun 28, 2010
Dataset provided by
Biological and Chemical Data Management Office
Authors
William M. Balch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Wt, chl, lat, lon, sal, date, time, Lt443, Lt491, Lt510, and 34 more
Measurement technique
Wet Labs ECO Triplet, Radiometer, Fluorometer, MicroTSG Thermosalinograph
Description
Along track temperature, Salinity, backscatter, Chlorophyll Fluoresence, and normalized water leaving radiance (nLw).

On the bow of the R/V Roger Revelle was a Satlantic SeaWiFS Aircraft Simulator (MicroSAS) system, used to estimate water-leaving radiance from the ship, analogous to to the nLw derived by the SeaWiFS and MODIS satellite sensors, but free from atmospheric error (hence, it can provide data below clouds).

The system consisted of a down-looking radiance sensor and a sky-viewing radiance sensor, both mounted on a steerable holder on the bow. A downwelling irradiance sensor was mounted at the top of the ship's meterological mast, on the bow, far from any potentially shading structures. These data were used to estimate normalized water-leaving radiance as a function of wavelength. The radiance detector was set to view the water at 40deg from nadir as recommended by Mueller et al. [2003b]. The water radiance sensor was able to view over an azimuth range of ~180deg across the ship's heading with no viewing of the ship's wake. The direction of the sensor was adjusted to view the water 90-120deg from the sun's azimuth, to minimize sun glint. This was continually adjusted as the time and ship's gyro heading were used to calculate the sun's position using an astronomical solar position subroutine interfaced with a stepping motor which was attached to the radiometer mount (designed and fabricated at Bigelow Laboratory for Ocean Sciences). Protocols for operation and calibration were performed according to Mueller [Mueller et al., 2003a; Mueller et al., 2003b; Mueller et al., 2003c]. Before 1000h and after 1400h, data quality was poorer as the solar zenith angle was too low. Post-cruise, the 10Hz data were filtered to remove as much residual white cap and glint as possible (we accept the lowest 5% of the data). Reflectance plaque measurements were made several times at local apparent noon on sunny days to verify the radiometer calibrations.

Within an hour of local apparent noon each day, a Satlantic OCP sensor was deployed off the stern of the R/V Revelle after the ship oriented so that the sun was off the stern. The ship would secure the starboard Z-drive, and use port Z-drive and bow thruster to move the ship ahead at about 25cm s-1. The OCP was then trailed aft and brought to the surface ~100m aft of the ship, then allowed to sink to 100m as downwelling spectral irradiance and upwelling spectral radiance were recorded continuously along with temperature and salinity. This procedure ensured there were no ship shadow effects in the radiometry.

Instruments include a WETLabs wetstar fluorometer, a WETLabs ECOTriplet and a SeaBird microTSG.
Radiometry was done using a Satlantic 7 channel microSAS system with Es, Lt and Li sensors.

Chl data is based on inter calibrating surface discrete Chlorophyll measure with the temporally closest fluorescence measurement and applying the regression results to all fluorescence data.

Data have been corrected for instrument biofouling and drift based on weekly purewater calibrations of the system. Radiometric data has been processed using standard Satlantic processing software and has been checked with periodic plaque measurements using a 2% spectralon standard.

Lw is calculated from Lt and Lsky and is "what Lt would be if the
sensor were looking straight down". Since our sensors are mounted at
40o, based on various NASA protocols, we need to do that conversion.

Lwn adds Es to the mix. Es is used to normalize Lw. Nlw is related to Rrs, Remote Sensing Reflectance

Techniques used are as described in:
Balch WM, Drapeau DT, Bowler BC, Booth ES, Windecker LA, Ashe A (2008) Space-time variability of carbon standing stocks and fixation rates in the Gulf of Maine, along the GNATS transect between Portland, ME, USA, and Yarmouth, Nova Scotia, Canada. J Plankton Res 30:119-139
f
snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort)
figshare.com
bin
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Mikolajewicz (2024). snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort) [Dataset]. http://doi.org/10.6084/m9.figshare.25917628.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25917628.v1
Dataset updated
Jun 4, 2024
Dataset provided by
figshare
Authors
Nicholas Mikolajewicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
f
PlotTwist: A web app for plotting and annotating continuous data
figshare.com
plos.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joachim Goedhart (2023). PlotTwist: A web app for plotting and annotating continuous data [Dataset]. http://doi.org/10.1371/journal.pbio.3000581
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.3000581
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Biology
Authors
Joachim Goedhart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experimental data can broadly be divided in discrete or continuous data. Continuous data are obtained from measurements that are performed as a function of another quantitative variable, e.g., time, length, concentration, or wavelength. The results from these types of experiments are often used to generate plots that visualize the measured variable on a continuous, quantitative scale. To simplify state-of-the-art data visualization and annotation of data from such experiments, an open-source tool was created with R/shiny that does not require coding skills to operate it. The freely available web app accepts wide (spreadsheet) and tidy data and offers a range of options to normalize the data. The data from individual objects can be shown in 3 different ways: (1) lines with unique colors, (2) small multiples, and (3) heatmap-style display. Next to this, the mean can be displayed with a 95% confidence interval for the visual comparison of different conditions. Several color-blind-friendly palettes are available to label the data and/or statistics. The plots can be annotated with graphical features and/or text to indicate any perturbations that are relevant. All user-defined settings can be stored for reproducibility of the data visualization. The app is dubbed PlotTwist and runs locally or online: https://huygens.science.uva.nl/PlotTwist
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.3389/fgene.2019.00400.s002

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Clear search

Close search

Google apps

Main menu

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

Methods for normalizing microbiome data: an ecological perspective

Naturalistic Neuroimaging Database

Overview

v2.0 Changes

Example subjects for Mobilise-D data standardization

CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

Dataset for: Navigating Ecosystem Services Trade-offs: A Global...

Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

Underway Data (SAS) from R/V Roger Revelle KNOX22RR in the Patagonian Shelf...

snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort)

PlotTwist: A web app for plotting and annotating continuous data

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zipSee More Versions

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip