100+ datasets found
  1. Comparison of alternative approaches for analysing multi-level RNA-seq data

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irina Mohorianu; Amanda Bretman; Damian T. Smith; Emily K. Fowler; Tamas Dalmay; Tracey Chapman (2023). Comparison of alternative approaches for analysing multi-level RNA-seq data [Dataset]. http://doi.org/10.1371/journal.pone.0182694
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Irina Mohorianu; Amanda Bretman; Damian T. Smith; Emily K. Fowler; Tamas Dalmay; Tracey Chapman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.

  2. Reference-based RNA-seq data analysis (training data)

    • zenodo.org
    bin
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bérénice Batut; Pavankumar Videm; Anika Erxleben; Björn Grüning; Bérénice Batut; Pavankumar Videm; Anika Erxleben; Björn Grüning (2023). Reference-based RNA-seq data analysis (training data) [Dataset]. http://doi.org/10.5281/zenodo.1185122
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bérénice Batut; Pavankumar Videm; Anika Erxleben; Björn Grüning; Bérénice Batut; Pavankumar Videm; Anika Erxleben; Björn Grüning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-Seq data from a study published by Brooks et al. 2011 to identify genes and exons that are regulated by Pasilla gene.

  3. f

    RNA-seq data analysis summary.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klemm, Paul; Becker, Stephan; Biedenkopf, Nadine; Lechner, Marcus; Weber, Friedemann; Schlereth, Julia; Hartmann, Roland K.; Schoen, Andreas; Kämper, Lennart; Bach, Simone; Demper, Jana-Christin (2021). RNA-seq data analysis summary. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000808954
    Explore at:
    Dataset updated
    Oct 26, 2021
    Authors
    Klemm, Paul; Becker, Stephan; Biedenkopf, Nadine; Lechner, Marcus; Weber, Friedemann; Schlereth, Julia; Hartmann, Roland K.; Schoen, Andreas; Kämper, Lennart; Bach, Simone; Demper, Jana-Christin
    Description

    For methodological details, see S1 Text, paragraph "RNA-Seq Analysis". (XLSX)

  4. f

    Data Sheet 1_From bench to bytes: a practical guide to RNA sequencing data...

    • frontiersin.figshare.com
    docx
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prabin Dawadi; Bivek Pokharel; Anita Shrestha; Dikshya Niraula; Afifa Naeem; Sayaka Miura; Mishal Roy; Saroj Nepal (2025). Data Sheet 1_From bench to bytes: a practical guide to RNA sequencing data analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2025.1697922.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 27, 2025
    Dataset provided by
    Frontiers
    Authors
    Prabin Dawadi; Bivek Pokharel; Anita Shrestha; Dikshya Niraula; Afifa Naeem; Sayaka Miura; Mishal Roy; Saroj Nepal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA sequencing (RNA-Seq) is a high-throughput sequencing approach that enables comprehensive quantification of transcriptomes at a genome-wide scale. As a result, RNA-Seq has become a routine component of molecular biology research, and more researchers are now expected to analyze RNA-Seq data as part of their projects. However, unlike the largely experimental nature of benchwork, RNA-Seq analysis demands proficiency with computational and statistical approaches to manage technical issues and large data sizes. Although numerous manuals and reviews on RNA-Seq data analysis are available, many are either highly specialized, fragmented, or overly superficial, leaving beginners to use tools without understanding the underlying principles. To address this gap, we provide a decision-oriented guide tailored for molecular biologists encountering RNA-Seq analysis for the first time. This review is designed for readers to enable to decide which tools and statistical approaches to use based on their data, goals, and constraints. We aim to equip beginners with the knowledge required to perform RNA-Seq analysis rigorously and with confidence.

  5. Data from: RNA-seq-analysis-of-mycobacteria-stress-response-to-microgravity

    • osdr.nasa.gov
    • s.cnmilf.com
    • +4more
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lynn Harrison (2025). RNA-seq-analysis-of-mycobacteria-stress-response-to-microgravity [Dataset]. https://osdr.nasa.gov/bio/repo/data/studies/OSD-90
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Authors
    Lynn Harrison
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The aim of this work is to determine whether mycobacteria have enhanced virulence during space travel and what mechanisms they use to adapt to microgravity. M. marinum and LHM4 were grown in high aspect ratio vessels (HARV) in a rotary cell culture system (RCCS) under normal gravity (NG) or low shear simulated microgravity (MG). To determine the effect of MG on the stress responses activated by the growth conditions, we used RNAseq to examine what genes were expressed. For RNAseq, the bacteria are harvested, RNA isolated and converted DNA (cDNA), and the cDNA sequenced. Using bioinformatics, the amount of expression of the different M. marinum genes were compared between the NG and MG samples. To make sure that we were examining only gene expression changes due to MG, only bacteria in early exponential growth were used in the RNAseq studies. Triplicate NG and MG cultures were used to generate samples of bacteria grown for ~40 hrs. We also grew triplicate cultures for 4 days and then diluted them again and grew them for another ~40 hrs so we could examine gene expression from bacteria exposed for a longer time. In summary, this study determined that waterborne mycobacteria alter their growth, expression of stress responses, and their sensitivity to oxidizing conditions when subjected to growth under MG.

  6. SCANPY Python package for scRNA-seq analysis

    • kaggle.com
    zip
    Updated Feb 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2022). SCANPY Python package for scRNA-seq analysis [Dataset]. https://www.kaggle.com/datasets/alexandervc/scanpy-python-package-for-scrnaseq-analysis
    Explore at:
    zip(915767 bytes)Available download formats
    Dataset updated
    Feb 5, 2022
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev (Scanpy is not always reliable for cell cycle analysis ).

    https://scanpy.readthedocs.io/en/stable/

    Scanpy – Single-Cell Analysis in Python

    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

    Single cell RNA sequencing data - count matrices: rows - correspond to cells, columns to genes, value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    SCANPY is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells (https://github.com/theislab/Scanpy). Along with SCANPY, we present ANNDATA, a generic class for handling annotated data matrices (https://github.com/theislab/anndata).

    Paper:

    Wolf, F., Angerer, P. & Theis, F. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018). https://doi.org/10.1186/s13059-017-1382-0 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6 Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

  7. RNA_Seq_Data_Preprocessing_DGE analysis

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). RNA_Seq_Data_Preprocessing_DGE analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/rna-seq-data-preprocessing-dge-analysis
    Explore at:
    zip(75256 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains RNA-Seq data preprocessing and differential gene expression (DGE) analysis.

    It is designed for researchers, bioinformaticians, and students interested in transcriptomics.

    The dataset includes raw count data and step-by-step preprocessing instructions.

    It demonstrates quality control, normalization, and filtering of RNA-Seq data.

    Differential expression analysis using popular tools and methods is included.

    Results include differentially expressed genes with statistical significance.

    It provides visualizations like PCA plots, heatmaps, and volcano plots.

    The dataset is suitable for learning and reproducing RNA-Seq workflows.

    Both human-readable explanations and code snippets are included for guidance.

    It can serve as a reference for new RNA-Seq projects and research pipelines.

  8. RNA-seq example data

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuhin Rana (2023). RNA-seq example data [Dataset]. https://www.kaggle.com/datasets/rana2hin/rna-seq-example-data/discussion
    Explore at:
    zip(2193914798 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Tuhin Rana
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Dataset Description

    This dataset contains RNA-seq data from human cells. The data was collected using the Illumina HiSeq 2500 platform. The data includes raw sequencing reads, gene annotations, and phenotypic data for the samples.

    Files and Folders

    Files can be downloaded using the following command:

    wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
    

    Once the file has been downloaded, it can be extracted using the following command:

    tar xvzf chrX_data.tar.gz
    

    This will create a directory called chrX_data containing the following files:

    genes/chrX.gtf
    genome/chrX.fa
    geuvadis_phenodata.csv
    indexes/
    mergelist.txt
    samples/
    

    Here are some additional details about the files in the chrX_data directory:

    • genes/chrX.gtf - This file contains gene annotations for the human X chromosome. It is in the GTF format, which is a standard format for gene annotations. The GTF file contains information about the start and end positions of genes, as well as their transcripts.
    • genome/chrX.fa - This file contains the reference genome sequence for the human X chromosome. It is in the FASTA format, which is a standard format for storing DNA sequences.
    • geuvadis_phenodata.csv - This file contains phenotypic data for the samples in the dataset. The phenotypic data includes information such as the age, sex, and disease status of the samples.
    • indexes/ - This directory contains index files for HISAT2. Index files are used to speed up the alignment of sequencing reads to a reference genome.
    • mergelist.txt - This file lists the samples to be merged. The samples in the samples/ directory can be merged using a variety of tools, such as BEDTools and STAR.
    • samples/ - This directory contains the raw sequencing data. The raw sequencing data is in the FASTQ format, which is a standard format for storing sequencing reads.

    Usage

    This dataset can be used to perform RNA-seq analysis using a variety of tools, such as HISAT2, StringTie, and Ballgown.

    Here are some examples of how this dataset can be used:

    • To identify differentially expressed genes between two groups of samples.
    • To build a gene expression atlas for a particular tissue or cell type.
    • To study the expression of genes involved in a particular disease.

    source: ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz

  9. d

    ReCount - A multi-experiment resource of analysis-ready RNA-seq gene count...

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ReCount - A multi-experiment resource of analysis-ready RNA-seq gene count datasets [Dataset]. http://identifiers.org/RRID:SCR_001774
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, they combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.

  10. m

    Investigating Highly Variable Genes in Single-cell RNA-seq Data across...

    • data.mendeley.com
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jantarika Kumar Arora (2023). Investigating Highly Variable Genes in Single-cell RNA-seq Data across Multiple Cell Types and Conditions [Dataset]. http://doi.org/10.17632/6ry3x7r8hf.3
    Explore at:
    Dataset updated
    May 16, 2023
    Authors
    Jantarika Kumar Arora
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.

  11. CellSIUS provides sensitive and specific detection of rare cell populations...

    • zenodo.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebekka Wegmann; Marilisa Neri; Rebekka Wegmann; Marilisa Neri (2020). CellSIUS provides sensitive and specific detection of rare cell populations from complex single cell RNA-seq data: Codes and processed data [Dataset]. http://doi.org/10.5281/zenodo.3238275
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rebekka Wegmann; Marilisa Neri; Rebekka Wegmann; Marilisa Neri
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    Codes and processed data to reproduce the analysis discussed in:

    Wegmann et Al., CellSIUS provides sensitive and specific detection of rare cell
    populations from complex single cell RNA-seq data
    , Genome Biology 2019 (Accepted)

  12. Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World...

    • technavio.com
    pdf
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Singapore, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Area covered
    United Kingdom, United States
    Description

    Snapshot img

    NGS-Based Rna-Seq Market Size 2024-2028

    The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.

    The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
    The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
    

    What will be the Size of the NGS-based RNA-Seq market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.

    Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.

    The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.

    How is this NGS-based RNA-Seq industry segmented?

    The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Acamedic and research centers
      Clinical research
      Pharma companies
      Hospitals
    
    
    Technology
    
      Sequencing by synthesis
      Ion semiconductor sequencing
      Single-molecule real-time sequencing
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Singapore
    
    
      Rest of World (ROW)
    

    .

    By End-user Insights

    The acamedic and research centers segment is estimated to witness significant growth during the forecast period.

    The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification of nov

  13. f

    Data_Sheet_1_Integrative Differential Expression Analysis for Multiple...

    • frontiersin.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Jiménez-Jacinto; Alejandro Sanchez-Flores; Leticia Vega-Alvarado (2023). Data_Sheet_1_Integrative Differential Expression Analysis for Multiple EXperiments (IDEAMEX): A Web Server Tool for Integrated RNA-Seq Data Analysis.CSV [Dataset]. http://doi.org/10.3389/fgene.2019.00279.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Verónica Jiménez-Jacinto; Alejandro Sanchez-Flores; Leticia Vega-Alvarado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current DNA sequencing technologies and their high-throughput yield, allowed the thrive of genomic and transcriptomic experiments but it also have generated big data problem. Due to this exponential growth of sequencing data, also the complexity of managing, processing and interpreting it in order to generate results, has raised. Therefore, the demand of easy-to-use friendly software and websites to run bioinformatic tools is imminent. In particular, RNA-Seq and differential expression analysis have become a popular and useful method to evaluate the genetic expression change in any organism. However, many scientists struggle with the data analysis since most of the available tools are implemented in a UNIX-based environment. Therefore, we have developed the web server IDEAMEX (Integrative Differential Expression Analysis for Multiple EXperiments). The IDEAMEX pipeline needs a raw count table for as many desired replicates and conditions, allowing the user to select which conditions will be compared, instead of doing all-vs.-all comparisons. The whole process consists of three main steps (1) Data Analysis: that allows a preliminary analysis for quality control based on the data distribution per sample, using different types of graphs; (2) Differential expression: performs the differential expression analysis with or without batch effect error awareness, using the bioconductor packages, NOISeq, limma-Voom, DESeq2 and edgeR, and generate reports for each method; (3) Result integration: the obtained results the integrated results are reported using different graphical outputs such as correlograms, heatmaps, Venn diagrams and text lists. Our server allows an easy and friendly visualization for results, providing an easy interaction during the analysis process, as well as error tracking and debugging by providing output log files. The server is currently available and can be accessed at http://www.uusmb.unam.mx/ideamex/ where the documentation and example input files are provided. We consider that this web server can help other researchers with no previous bioinformatic knowledge, to perform their analyses in a simple manner.

  14. s

    Data from: Transcriptomic analysis reveals pro-inflammatory signatures...

    • figshare.scilifelab.se
    • demo.researchdata.se
    • +2more
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Holmfeldt; Svea Stratmann (2025). Data from: Transcriptomic analysis reveals pro-inflammatory signatures associated with acute myeloid leukemia progression [Dataset]. http://doi.org/10.17044/scilifelab.13105229.v1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala Universitet
    Authors
    Linda Holmfeldt; Svea Stratmann
    License

    https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

    Description

    Data Set Description

    These data are collected from a total of 70 participants (47 adult; 23 pediatric), all of which had relapsed or primary resistant acute myeloid leukemia. The data, which here are separated into an adult and a pediatric dataset, were generated as part of a study by Stratmann et. al. (https://doi.org/10.1182/bloodadvances.2021004962). The Stratmann et. al. study is currently pre-published here: https://ashpublications.org/bloodadvances/article/doi/10.1182/bloodadvances.2021004962/477210/Transcriptomic-analysis-reveals-pro-inflammatory Please note that separate applications are necessary for the adult and pediatric dataset, respectively. When applying for access, please indicate which of the datasets that the application applies for. The adult dataset contains transcriptome sequencing (RNA-seq) data from 25 diagnosis (D), 45 relapse (R1/R2/R3) and five (5) primary resistant (PR) leukemic samples from 47 patients, as well as five (5) normal CD34+ bone marrow control samples. The pediatric dataset contains RNA-seq data from 18 diagnosis (D), 22 relapse (R1/R2), six (6) persistent relapse (R1/2-P) and one (1) primary resistant (PR) leukemic samples from 23 patients, as well as five (5) normal CD34+ bone marrow control samples. The leukemic samples originate from bone marrow or peripheral blood. The normal RNA samples originate from purified CD34+ bone marrow cells from five different healthy individuals. Further details regarding the samples are available in the Supplemental Information part of Stratmann et. al. (https://doi.org/10.1182/bloodadvances.2021004962). RNA-seq libraries and associated next-generation sequencing were carried out by the SNP&SEQ Technology platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. Libraries were prepared using the TruSeq stranded total RNA library preparation kit with ribosomal depletion by RiboZero Gold (Illumina). Sequencing of adult samples was carried out on the Illumina HiSeq2500 platform, generating paired-end 125bp reads using v4 sequencing chemistry. Sequencing of pediatric samples was carried out on the Illumina NovaSeq6000 platform (S2 flowcell), generating paired-end 100bp reads using the v1 sequencing chemistry. The CD34+ bone marrow control samples were sequenced using both platforms (Illumina HiSeq2500 and NovaSeq6000). Further, all of these acute myeloid leukemia samples have also been characterized by whole genome sequencing or whole exome sequencing, with the datasets available under controlled access through doi.org/10.17044/scilifelab.12292778. Terms for accessThe adult and pediatric datasets are only to be used for research that is seeking to advance the understanding of the influence of genetic and transcriptomic factors on human acute myeloid leukemia etiology and biology. Use of the protected pediatric dataset is only for research projects that can merely be conducted using pediatric acute myeloid leukemia data, and for which the research objectives cannot be accomplished using data from adults. Applications intending various method development would thus not be considered as acceptable for use of the pediatric dataset. Further, the pediatric dataset may not be used for research investigating predisposition for acute myeloid leukemia based on germline variants.

    For conditional access to the adult and/or pediatric dataset in this publication, please contact datacentre@scilifelab.se

  15. d

    Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Habermann; Margaux Haering (2025). Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis [Dataset]. http://doi.org/10.5061/dryad.8pk0p2nnd
    Explore at:
    Dataset updated
    May 18, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Bianca Habermann; Margaux Haering
    Time period covered
    Jul 8, 2021
    Description

    BackgroundÂ

    RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.

    Results

    With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to...

  16. n

    BioXpress

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). BioXpress [Dataset]. http://identifiers.org/RRID:SCR_014191
    Explore at:
    Dataset updated
    Oct 4, 2024
    Description

    BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. BioXpress can be searched by gene name or cancer type. To search the database by gene name, select the appropriate identifier type from the dropdown menu and type in the corresponding identifier in the adjacent text box. The results are computed and presented to the user with information such as variable expression levels and tumor expression. To search by cancer type, select the desired type from the dropdown menu, such as "Cancer Type", "Significant", "Expression", "Adjusted p-value" and "p-value". Results are shown in a graph displaying the top 10 differentially expressed genes for the specified cancer type in terms of the frequency of significant altered expression between the tumor and normal pairs.

  17. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Authors
    Hsu, Jonathan; Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  18. f

    Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jun 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bushel, Pierre R.; Ramaiahgari, Sreenivasa C.; Auerbach, Scott S.; Paules, Richard S.; Ferguson, Stephen S. (2020). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000579045
    Explore at:
    Dataset updated
    Jun 23, 2020
    Authors
    Bushel, Pierre R.; Ramaiahgari, Sreenivasa C.; Auerbach, Scott S.; Paules, Richard S.; Ferguson, Stephen S.
    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  19. Z

    Training material for small RNA-seq data analysis (Galaxy Training Network...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Freeberg, Mallory (2020). Training material for small RNA-seq data analysis (Galaxy Training Network tutorial) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_826905
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Johns Hopkins University
    Authors
    Freeberg, Mallory
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data provided here are part of a Galaxy Training Network tutorial that analyzes small RNA-seq (sRNA-seq) data from a study published by Harrington et al. (DOI:10.1186/s12864-017-3692-8) to detect differential abundance of various classes of endogenous short interfering RNAs (esiRNAs). The goal of this study was to investigate "connections between differential retroTn and hp-derived esiRNA processing and cellular location, and to investigate the potential link between mRNA 3' end cleavage and esiRNA biogenesis." To this end, sRNA-seq libraries were constructed from triplicate Drosophila tissue culture samples under conditions of either control RNAi or RNAi knockdown of a factor involved in mRNA 3' end processing, Symplekin. This dataset (GEO Accession: GSE82128) consists of single-end, size-selected, non-rRNA-depleted sRNA-seq libraries. Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to a subset of interesting transcript features including: (1) transposable elements, (2) Drosophila piRNA clusters, (3) Symplekin, and (4) genes encoding mass spectrometry-defined protein binding partners of Symplekin from Additional File 2 in the indicated paper by Harrington et al. More details on features 1 and 2 can be found here: https://github.com/bowhan/piPipes/blob/master/common/dm3/genomic_features (piRNA_Cluster, Trn). All features are from the Drosophila genome Apr. 2006 (BDGP R5/dm3) release.

  20. m

    Data for: RNA-seq data analysis of stimulated hepatocellular carcinoma cells...

    • data.mendeley.com
    Updated Oct 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heleni Loutrari (2019). Data for: RNA-seq data analysis of stimulated hepatocellular carcinoma cells treated with epigallocatechin gallate and fisetin reveals target genes and action mechanisms (Part1 - epigallocatechin gallate treatment) [Dataset]. http://doi.org/10.17632/n6bzf2nzj6.1
    Explore at:
    Dataset updated
    Oct 29, 2019
    Authors
    Heleni Loutrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this study, we endeavor to compare gene expression alterations mediated by flavonoids epigallocatechin gallate (EGCG) and fisetin (FIS) through a comprehensive transcriptome analysis based on RNA-seq in human hepatocellular carcinoma HEP3B cells, upon perturbation with a mixture of prototypical stimuli mimicking conditions of tumor microenvironment (STIM), or under constitutive state (MEM). HEP3B cells, seeded the day before in a 6-well plate, were serum-starved for 4h and then treated with EGCG (100μM), FIS (10μM) or DMSO (0.1 % v/v) for 2h. Cells were subsequently exposed to a mixture of stimuli consisting of recombinant interleukins IL-6 (0.1μg/ml), IL-1B (0.01μg/ml) and tumor growth factor A (TGFA) (0.2μg/ml) and were further incubated for 22h. Samples of all possible treatment combinations of HEP3B cells i.e. EGCG, FIS, or DMSO at either MEM or STIM state were subjected to RNA extraction from two independent experiments.Total RNA was isolated using the PureLink RNA Mini Kit (Invitrogen, USA) according to the manufacturer’s instructions. Quantification and quality control of isolated RNA was performed by measuring absorbance at 260nm and 280nm on a NANODROP ONEC spectrophotometer (Thermo Scientific, USA). The RNA-seq run was performed on a NextSeq 500 Illumina platform that provided single-end reads of 85bp length. Quality assessment, library preparation (TruSeqLT) and the actual sequencing run was conducted in the Biomedical Research Foundation of the Academy of Athens (BRFAA) sequencing facility. Herein, we provide (compressed in .bz2 file format) raw sequencing FASTQ files regarding the EGCG treatment, along with a descriptive metadata file (EGCG_metadata.pdf). Due to storage limitations, respective data about FIS treatment are provided in a separate dataset entitled "Data for: RNA-seq data analysis of stimulated hepatocellular carcinoma cells treated with epigallocatechin gallate and fisetin reveals target genes and action mechanisms (Part2 - fisetin treatment)".

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Irina Mohorianu; Amanda Bretman; Damian T. Smith; Emily K. Fowler; Tamas Dalmay; Tracey Chapman (2023). Comparison of alternative approaches for analysing multi-level RNA-seq data [Dataset]. http://doi.org/10.1371/journal.pone.0182694
Organization logo

Comparison of alternative approaches for analysing multi-level RNA-seq data

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Irina Mohorianu; Amanda Bretman; Damian T. Smith; Emily K. Fowler; Tamas Dalmay; Tracey Chapman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.

Search
Clear search
Close search
Google apps
Main menu