7 datasets found
  1. Intermediate data for TE calculation

    • zenodo.org
    bin, csv
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Liu; Yue Liu (2025). Intermediate data for TE calculation [Dataset]. http://doi.org/10.5281/zenodo.10373032
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yue Liu; Yue Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.

    We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.

    The detailed explanation for each file:

    human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.

    human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.

    human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.

    human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.

    human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.

    human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.

    mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

    mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

    mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.

    mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.

    mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.

    mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.

    All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
    * coverage > 0.1 X
    * CDS percentage > 70%
    * R2 between RNA and RIBO >= 0.188 (remove outliers)

    All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)

    ####code
    If you need to read rda data please use load("rdaname.rda") with R

    If you need to calculate proportional similarity from clr data:
    library(propr)
    human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
    rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)

  2. Data_Sheet_3_Compositional Data Analysis of Periodontal Disease Microbial...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Sisk-Hackworth; Adrian Ortiz-Velez; Micheal B. Reed; Scott T. Kelley (2023). Data_Sheet_3_Compositional Data Analysis of Periodontal Disease Microbial Communities.ZIP [Dataset]. http://doi.org/10.3389/fmicb.2021.617949.s003
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Laura Sisk-Hackworth; Adrian Ortiz-Velez; Micheal B. Reed; Scott T. Kelley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Culture-independent methods, such as next-generation sequencing (NGS) of bacteria 16S amplicon and shotgun metagenomic libraries, have greatly expanded our understanding of PD biodiversity, identified novel PD microbial associations, and shown that PD biodiversity increases with pocket depth. NGS studies have also found PD communities to be highly host-specific in terms of both biodiversity and the response of microbial communities to periodontal treatment. As with most microbiome work, the majority of PD microbiome studies use standard data normalization procedures that do not account for the compositional nature of NGS microbiome data. Here, we apply recently developed compositional data analysis (CoDA) approaches and software tools to reanalyze multiomics (16S, metagenomics, and metabolomics) data generated from previously published periodontal disease studies. CoDA methods, such as centered log-ratio (clr) transformation, compensate for the compositional nature of these data, which can not only remove spurious correlations but also allows for the identification of novel associations between microbial features and disease conditions. We validated many of the studies’ original findings, but also identified new features associated with periodontal disease, including the genera Schwartzia and Aerococcus and the cytokine C-reactive protein (CRP). Furthermore, our network analysis revealed a lower connectivity among taxa in deeper periodontal pockets, potentially indicative of a more “random” microbiome. Our findings illustrate the utility of CoDA techniques in multiomics compositional data analysis of the oral microbiome.

  3. f

    DataSheet_1_Optimising high-throughput sequencing data analysis, from gene...

    • frontiersin.figshare.com
    pdf
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simin Wang; Dominik Schneider; Tamara R. Hartke; Johannes Ballauff; Carina Carneiro de Melo Moura; Garvin Schulz; Zhipeng Li; Andrea Polle; Rolf Daniel; Oliver Gailing; Bambang Irawan; Stefan Scheu; Valentyna Krashevska (2024). DataSheet_1_Optimising high-throughput sequencing data analysis, from gene database selection to the analysis of compositional data: a case study on tropical soil nematodes.pdf [Dataset]. http://doi.org/10.3389/fevo.2024.1168288.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Frontiers
    Authors
    Simin Wang; Dominik Schneider; Tamara R. Hartke; Johannes Ballauff; Carina Carneiro de Melo Moura; Garvin Schulz; Zhipeng Li; Andrea Polle; Rolf Daniel; Oliver Gailing; Bambang Irawan; Stefan Scheu; Valentyna Krashevska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionHigh-throughput sequencing (HTS) provides an efficient and cost-effective way to generate large amounts of sequence data, providing a very powerful tool to analyze biodiversity of soil organisms. However, marker-based methods and the resulting datasets come with a range of challenges and disputes, including incomplete reference databases, controversial sequence similarity thresholds for delimitating taxa, and downstream compositional data analysis. MethodsHere, we use HTS data from a soil nematode biodiversity experiment to explore standardized HTS data processing procedures. We compared the taxonomic assignment performance of two main rDNA reference databases (SILVA and PR2). We tested whether the same ecological patterns are detected with Amplicon Sequence Variants (ASV; 100% similarity) versus classical Operational Taxonomic Units (OTU; 97% similarity). Further, we tested how different HTS data normalization methods affect the recovery of beta diversity patterns and the identification of differentially abundant taxa.ResultsAt this time, the SILVA 138 eukaryotic database performed better than the PR2 4.12 database, assigning more reads to family level and providing higher phylogenetic resolution. ASV- and OTU-based alpha and beta diversity of nematodes correlated closely, indicating that OTU-based studies represent useful reference points. For downstream data analyses, our results indicate that loss of data during subsampling under rarefaction-based methods might reduce the sensitivity of the method, e.g. underestimate the differences between nematode communities under different treatments, while the clr-transformation-based methods may overestimate effects. The Analysis of Compositions of Microbiome with Bias Correction approach (ANCOM-BC) retains all data and accounts for uneven sampling fractions for each sample, suggesting that this is currently the optimal method to analyze compositional data.DiscussionOverall, our study highlights the importance of comparing and selecting taxonomic reference databases before data analyses, and provides solid evidence for the similarity and comparability between OTU- and ASV-based nematode studies. Further, the results highlight the potential weakness of rarefaction-based and clr-transformation-based methods. We recommend future studies use ASV and that both the taxonomic reference databases and normalization strategies are carefully tested and selected before analyzing the data.

  4. Normalized element ratios of sediment core LVL15-1 from Lake Vouliagmeni,...

    • doi.pangaea.de
    html, tsv
    Updated Nov 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Koutsodendris; Achim Brauer; Oliver Friedrich; Rik Tjallingii; Victoria Putyrskaya; Barbara Hennrich; Robert Kühn; Eckehard Klemt; Jörg Pross (2023). Normalized element ratios of sediment core LVL15-1 from Lake Vouliagmeni, Greece [Dataset]. http://doi.org/10.1594/PANGAEA.963433
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    PANGAEA
    Authors
    Andreas Koutsodendris; Achim Brauer; Oliver Friedrich; Rik Tjallingii; Victoria Putyrskaya; Barbara Hennrich; Robert Kühn; Eckehard Klemt; Jörg Pross
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 7, 2015
    Area covered
    Variables measured
    Core, Section, File name, Sample ID, Replicates, Logger voltage, Logger Amperage, Iron, normalized, Position, length, Total count rate, and 16 more
    Description

    The XRF core scanning for the elements Si, S, Cl, K, Ca, Ti, Mn, Fe, Br, and Sr was performed with an ITRAX core scanner equipped with a chromium X-ray tube at 200 μm step size, 30 kV tube voltage, 30 mA tube current, and a counting time of 10 s. To minimize sample-geometry effects related to differences in water content, surface irregularities, and sediment density, raw-element intensities (cps) were normalized by center-log-ratio (CLR) transformation.

  5. Figure 4 from manuscript Sparsely-Connected Autoencoder (SCA) for single...

    • figshare.com
    zip
    Updated Aug 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Calogero (2020). Figure 4 from manuscript Sparsely-Connected Autoencoder (SCA) for single cell RNAseq data mining [Dataset]. http://doi.org/10.6084/m9.figshare.12866717.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 26, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Raffaele Calogero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used to generate figure 4: QCM/QCC plots using different normalizations for the SCA input counts table. A) Log10 transformed (figure4/setA/Results/setAMIRNA_SIMLR/5/setA_StabilitySignificativityJittered.pdf), B) Centred log-ratio normalization (CLR) (figure4/setA/Results/CLR_FNMIRNA_SIMLR/5/normalized_CLR_FN_StabilitySignificativityJittered.pdf), C) relative log-expression (RLE) (figure4/setA/Results/DESEQ_FNMIRNA_SIMLR/5/normalized_DESEQ_FN_StabilitySignificativityJittered.pdf), D) full-quantile normalization (FQ) (figure4/setA/Results/FQ_FNMIRNA_SIMLR/5/normalized_FQ_FN_StabilitySignificativityJittered.pdf), E) sum scaling normalization (SUM) (/figure4/setA/Results/SUM_FNMIRNA_SIMLR/5/normalized_SUM_FN_StabilitySignificativityJittered.pdf), F) weighted trimmed mean of M-values (TMM) (figure4/setA/Results/TMM_FNMIRNA_SIMLR/5/normalized_TMM_FN_StabilitySignificativityJittered.pdf).

  6. f

    Normalized variation matrix of data in table (3).

    • figshare.com
    xls
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asghar Khan; Muhammad Saleem Khan; Juan José Egozcue; Munib Ahmed Shafique; Sidra Nadeem; Ghulam Saddiq (2023). Normalized variation matrix of data in table (3). [Dataset]. http://doi.org/10.1371/journal.pone.0279083.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Asghar Khan; Muhammad Saleem Khan; Juan José Egozcue; Munib Ahmed Shafique; Sidra Nadeem; Ghulam Saddiq
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Normalized variation matrix of data in table (3).

  7. Beta weights during imagined standing, normalized to rest.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel S. Peterson; Kristen A. Pickett; Ryan Duncan; Joel Perlmutter; Gammon M. Earhart (2023). Beta weights during imagined standing, normalized to rest. [Dataset]. http://doi.org/10.1371/journal.pone.0090634.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Daniel S. Peterson; Kristen A. Pickett; Ryan Duncan; Joel Perlmutter; Gammon M. Earhart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paired sample t-test comparing stand and rest beta weights.$Univariate ANCOVA with UPDRS as covariate.*Significantly different from rest at the 0.05 level.**Significantly different from rest at the 0.005 level.Abbreviations: SMA: supplementary motor area, GP: globus pallidus, MLR: mesencephalic locomotor region, CLR: cerebellar locomotor region.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yue Liu; Yue Liu (2025). Intermediate data for TE calculation [Dataset]. http://doi.org/10.5281/zenodo.10373032
Organization logo

Intermediate data for TE calculation

Explore at:
csv, binAvailable download formats
Dataset updated
May 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yue Liu; Yue Liu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.

We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.

The detailed explanation for each file:

human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.

human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.

human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.

human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.

mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.

mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.

mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.

mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.

All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
* coverage > 0.1 X
* CDS percentage > 70%
* R2 between RNA and RIBO >= 0.188 (remove outliers)

All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)

####code
If you need to read rda data please use load("rdaname.rda") with R

If you need to calculate proportional similarity from clr data:
library(propr)
human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)

Search
Clear search
Close search
Google apps
Main menu