Video on normalizing microbiome data from the Research Experiences in Microbiomes Network
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DBNorm test script. Code of how we test DBNorm package. (TXT 2Â kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DBNorm installation. Describes how to install DBNorm via devtools in R. (TXT 4Â kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for reproducing analysis in the manuscript:Normalizing and denoising protein expression data from droplet-based single cell profilinglink to manuscript: https://www.biorxiv.org/content/10.1101/2020.02.24.963603v1
Data deposited here are for the purposes of reproducing the analysis results and figures reported in the manuscript above. These data are all publicly available downloaded and converted to R datasets prior to Dec 4, 2020. For a full description of all the data included in this repository and instructions for reproducing all analysis results and figures, please see the repository: https://github.com/niaid/dsb_manuscript.
For usage of the dsb R package for normalizing CITE-seq data please see the repository: https://github.com/niaid/dsb
If you use the dsb R package in your work please cite:Mulè MP, Martins AJ, Tsang JS. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv. 2020;2020.02.24.963603.
General contact: John Tsang (john.tsang AT nih.gov)
Questions about software/code: Matt Mulè (mulemp AT nih.gov)
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.
R script to reproduce "Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities"..
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This R script performs normalisation of data obtained with the MitoPlate S-1 commercialised by Biolog. In addition, it creates a scatterplot of initial rate values between conditions of interest. The script includes a first normalisation step using the "No substrate" well (A1) required for the rows A to H and a second normalisation step using the "L-Malic Acid 100 µM" (G1) only required for the rows G and H. Initial rate values are calculated as the slope of a linear regression fitted between 30 minutes and 2 hours.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a compilation of published and new SNW data with corresponding environmental data extracted from CMIP6 that are used in the at depth species level Bayesian regression modelling. Environmental data for G. truncatulinoides comes from 200m depth, all other environmental data is from the sea surface (≤ 20 m).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Normalization
# Generate a resting state (rs) timeseries (ts)
# Install / load package to make fake fMRI ts
# install.packages("neuRosim")
library(neuRosim)
# Generate a ts
ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
# 3dDetrend -normalize
# R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
# Do for the full timeseries
ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
# Do this again for a shorter version of the same timeseries
ts.shorter.length <- length(ts.normalised.long)/4
ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
# By looking at the summaries, it can be seen that the median values become larger
summary(ts.normalised.long)
summary(ts.normalised.short)
# Plot results for the long and short ts
# Truncate the longer ts for plotting only
ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
# Give the plot a title
title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
# Add zero line
lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
# 3dDetrend -normalize -polort 0 for long timeseries
lines(ts.normalised.long.made.shorter, col='blue');
# 3dDetrend -normalize -polort 0 for short timeseries
lines(ts.normalised.short, col='red');
Standardization/modernization
New afni_proc.py command line
afni_proc.py \
-subj_id "$sub_id_name_1" \
-blocks despike tshift align tlrc volreg mask blur scale regress \
-radial_correlate_blocks tcat volreg \
-copy_anat anatomical_warped/anatSS.1.nii.gz \
-anat_has_skull no \
-anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \
-anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \
-anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \
-anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \
-anat_follower_erode fsvent fswm \
-dsets media_?.nii.gz \
-tcat_remove_first_trs 8 \
-tshift_opts_ts -tpattern alt+z2 \
-align_opts_aea -cost lpc+ZZ -giant_move -check_flip \
-tlrc_base "$basedset" \
-tlrc_NL_warp \
-tlrc_NL_warped_dsets \
anatomical_warped/anatQQ.1.nii.gz \
anatomical_warped/anatQQ.1.aff12.1D \
anatomical_warped/anatQQ.1_WARP.nii.gz \
-volreg_align_to MIN_OUTLIER \
-volreg_post_vr_allin yes \
-volreg_pvra_base_index MIN_OUTLIER \
-volreg_align_e2a \
-volreg_tlrc_warp \
-mask_opts_automask -clfrac 0.10 \
-mask_epi_anat yes \
-blur_to_fwhm -blur_size $blur \
-regress_motion_per_run \
-regress_ROI_PC fsvent 3 \
-regress_ROI_PC_per_run fsvent \
-regress_make_corr_vols aeseg fsvent \
-regress_anaticor_fast \
-regress_anaticor_label fswm \
-regress_censor_motion 0.3 \
-regress_censor_outliers 0.1 \
-regress_apply_mot_types demean deriv \
-regress_est_blur_epits \
-regress_est_blur_errts \
-regress_run_clustsim no \
-regress_polort 2 \
-regress_bandpass 0.01 1 \
-html_review_style pythonic
We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.
Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.
Effect on results
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table includes the new SNW data produced for this manuscript. The foraminiferal weight data is normalized using the measurement-based weight (MBW) method of Barker (2002). SNW measurements were collected from Atlantic core-tops and sediment cores for G. truncatulinoides, G. ruber, O. universa, N. pachyderma, N. incompta and G. bulloides.
Output files from the 8. Metadata Analysis Workflow page of the SWELTR high-temp study. In this workflow, we compared environmental metadata with microbial communities. The workflow is split into two parts.
metadata_ssu18_wf.rdata : Part 1 contains all variables and objects for the 16S rRNA analysis. To see the Objects, in R run _load("metadata_ssu18_wf.rdata", verbose=TRUE)_
metadata_its18_wf.rdata : Part 2 contains all variables and objects for
the ITS analysis. To see the Objects, in R run
_load("metadata_its18_wf.rdata", verbose=TRUE)_
Additional files:
In both workflows, we run the following steps:
1) Metadata Normality Tests: Shapiro-Wilk Normality Test to test whether
each matadata parameter is normally distributed.
2) Normalize Parameters: R package bestNormalize to find and execute the
best normalizing transformation.
3) Split Metadata parameters into groups: a) Environmental and edaphic
properties, b) Microbial functional responses, and c) Temperature adaptation
properties.
4) Autocorrelation Tests: Test all possible pair-wise comparisons, on both
normalized and non-normalized data sets, for each group.
5) Remove autocorrelated parameters from each group.
6) Dissimilarity Correlation Tests: Use Mantel Tests to see if any on the
metadata groups are significantly correlated with the community data.
7) Best Subset of Variables: Determine which of the metadata parameters
from each group are the most strongly correlated with the community data. For
this we use the bioenv function from the vegan package.
8) Distance-based Redundancy Analysis: Ordination analysis of samples and
metadata vector overlays using capscale, also from the vegan package.
Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/metadata.Rmd
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Microbial Counts - Picophytoplankton
# values used for normalizing from "out" by group # group fals(rel) redFL(rel) FL/fals ratio # group1 0.09 0.62 7.19 # group2 0.92 0.61 6.84 #
Fertility depends, in part, on interactions between male and female reproductive proteins inside the female reproductive tract (FRT) that mediate postmating changes in female behavior, morphology, and physiology. Coevolution between interacting proteins within species may drive reproductive incompatibilities between species, yet the mechanisms underlying postmating-prezygotic isolating barriers remain poorly resolved. Here, we used quantitative proteomics in sibling Drosophila species to investigate the molecular composition of the FRT environment and its role in mediating species-specific postmating responses. We found that (1) FRT proteomes in D. simulans and D. mauritiana virgin females express unique combinations of secreted proteins and are enriched for distinct functional categories, (2) mating induces substantial changes to the FRT proteome in D. mauritiana but not in D. simulans, and (3) the D. simulans FRT pr...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 3 types of data.
GPS data (the ones starting with "GPS") of sampling plot centers collected with a Trimble GPS and post processed to ensure positioning errors lower than 2 meters.
TLS data, (the ones starting with "ID_"): such data were collected in the end of August 2019 with a mobile terrestrial laser scanner (mobile ZEB TLS) in a squared area of approximatively 30x30m. Data have been normalized using TreeLS package in R.
ALS data collected in the end of July 2019. For the entire study area, we upload 2 different ALS data: "merged.las" is the original point cloud; "myLas_norm_lt22.las" is the normalised point cloud, cut at 22 meters from the ground in order to perform specific analysis (i.e. paper under submission).
Data collection was founded by the AGRIDIGIT Selvicoltura project.
# values used for normalizing from "out" by group # group fals(rel) redFL(rel) FL/fals ratio # group1 0.45 0.64 1.58 # group2 2.55 9.01 3.57 # group3 0.33 7.94 27.74 # group4 nd nd nd #
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a compilation of published and new SNW data with corresponding sea surface (≤ 20 m) environmental data extracted from CMIP6 that are used in the group level Bayesian regression modelling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.
The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Various authors have reported conflicting values for the energy return on investment (rE) of ethanol manufacture. Energy policy analysts predisposed to or against ethanol frequently cite selections from these studies to support their positions. This literature review takes an objective look at the disagreement by normalizing and comparing the data sets from ten such studies. Six of the reviewed studies treat starch ethanol from corn, and four treat cellulosic ethanol. Each normalized data set is also submitted to a uniform calculation of rE defined as the total product energy divided by nonrenewable energy input to its manufacture. Defined this way rE > 1 indicates that the ethanol product has nominally captured at least some renewable energy, and rE > 0.76 indicates that it consumes less nonrenewable energy in its manufacture than gasoline. The reviewed corn ethanol studies imply 0.84 ≤ rE ≤ 1.65; three of the cellulosic ethanol studies imply 4.40 ≤ rE ≤ 6.61. The fourth cellulosic ethanol study reports rE = 0.69 and may reasonably be considered an outlier.
Video on normalizing microbiome data from the Research Experiences in Microbiomes Network