Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.
We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.
The detailed explanation for each file:
human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.
human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.
human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.
human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.
human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.
human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.
mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.
mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.
mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.
mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.
mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.
mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.
All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
* coverage > 0.1 X
* CDS percentage > 70%
* R2 between RNA and RIBO >= 0.188 (remove outliers)
All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)
####code
If you need to read rda data please use load("rdaname.rda") with R
If you need to calculate proportional similarity from clr data:
library(propr)
human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Culture-independent methods, such as next-generation sequencing (NGS) of bacteria 16S amplicon and shotgun metagenomic libraries, have greatly expanded our understanding of PD biodiversity, identified novel PD microbial associations, and shown that PD biodiversity increases with pocket depth. NGS studies have also found PD communities to be highly host-specific in terms of both biodiversity and the response of microbial communities to periodontal treatment. As with most microbiome work, the majority of PD microbiome studies use standard data normalization procedures that do not account for the compositional nature of NGS microbiome data. Here, we apply recently developed compositional data analysis (CoDA) approaches and software tools to reanalyze multiomics (16S, metagenomics, and metabolomics) data generated from previously published periodontal disease studies. CoDA methods, such as centered log-ratio (clr) transformation, compensate for the compositional nature of these data, which can not only remove spurious correlations but also allows for the identification of novel associations between microbial features and disease conditions. We validated many of the studies’ original findings, but also identified new features associated with periodontal disease, including the genera Schwartzia and Aerococcus and the cytokine C-reactive protein (CRP). Furthermore, our network analysis revealed a lower connectivity among taxa in deeper periodontal pockets, potentially indicative of a more “random” microbiome. Our findings illustrate the utility of CoDA techniques in multiomics compositional data analysis of the oral microbiome.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionHigh-throughput sequencing (HTS) provides an efficient and cost-effective way to generate large amounts of sequence data, providing a very powerful tool to analyze biodiversity of soil organisms. However, marker-based methods and the resulting datasets come with a range of challenges and disputes, including incomplete reference databases, controversial sequence similarity thresholds for delimitating taxa, and downstream compositional data analysis. MethodsHere, we use HTS data from a soil nematode biodiversity experiment to explore standardized HTS data processing procedures. We compared the taxonomic assignment performance of two main rDNA reference databases (SILVA and PR2). We tested whether the same ecological patterns are detected with Amplicon Sequence Variants (ASV; 100% similarity) versus classical Operational Taxonomic Units (OTU; 97% similarity). Further, we tested how different HTS data normalization methods affect the recovery of beta diversity patterns and the identification of differentially abundant taxa.ResultsAt this time, the SILVA 138 eukaryotic database performed better than the PR2 4.12 database, assigning more reads to family level and providing higher phylogenetic resolution. ASV- and OTU-based alpha and beta diversity of nematodes correlated closely, indicating that OTU-based studies represent useful reference points. For downstream data analyses, our results indicate that loss of data during subsampling under rarefaction-based methods might reduce the sensitivity of the method, e.g. underestimate the differences between nematode communities under different treatments, while the clr-transformation-based methods may overestimate effects. The Analysis of Compositions of Microbiome with Bias Correction approach (ANCOM-BC) retains all data and accounts for uneven sampling fractions for each sample, suggesting that this is currently the optimal method to analyze compositional data.DiscussionOverall, our study highlights the importance of comparing and selecting taxonomic reference databases before data analyses, and provides solid evidence for the similarity and comparability between OTU- and ASV-based nematode studies. Further, the results highlight the potential weakness of rarefaction-based and clr-transformation-based methods. We recommend future studies use ASV and that both the taxonomic reference databases and normalization strategies are carefully tested and selected before analyzing the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The XRF core scanning for the elements Si, S, Cl, K, Ca, Ti, Mn, Fe, Br, and Sr was performed with an ITRAX core scanner equipped with a chromium X-ray tube at 200 μm step size, 30 kV tube voltage, 30 mA tube current, and a counting time of 10 s. To minimize sample-geometry effects related to differences in water content, surface irregularities, and sediment density, raw-element intensities (cps) were normalized by center-log-ratio (CLR) transformation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used to generate figure 4: QCM/QCC plots using different normalizations for the SCA input counts table. A) Log10 transformed (figure4/setA/Results/setAMIRNA_SIMLR/5/setA_StabilitySignificativityJittered.pdf), B) Centred log-ratio normalization (CLR) (figure4/setA/Results/CLR_FNMIRNA_SIMLR/5/normalized_CLR_FN_StabilitySignificativityJittered.pdf), C) relative log-expression (RLE) (figure4/setA/Results/DESEQ_FNMIRNA_SIMLR/5/normalized_DESEQ_FN_StabilitySignificativityJittered.pdf), D) full-quantile normalization (FQ) (figure4/setA/Results/FQ_FNMIRNA_SIMLR/5/normalized_FQ_FN_StabilitySignificativityJittered.pdf), E) sum scaling normalization (SUM) (/figure4/setA/Results/SUM_FNMIRNA_SIMLR/5/normalized_SUM_FN_StabilitySignificativityJittered.pdf), F) weighted trimmed mean of M-values (TMM) (figure4/setA/Results/TMM_FNMIRNA_SIMLR/5/normalized_TMM_FN_StabilitySignificativityJittered.pdf).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Normalized variation matrix of data in table (3).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.
We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.
The detailed explanation for each file:
human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.
human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.
human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.
human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.
human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.
human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.
mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.
mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.
mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.
mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.
mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.
mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.
All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
* coverage > 0.1 X
* CDS percentage > 70%
* R2 between RNA and RIBO >= 0.188 (remove outliers)
All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)
####code
If you need to read rda data please use load("rdaname.rda") with R
If you need to calculate proportional similarity from clr data:
library(propr)
human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)