Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.
For methodological details, see S1 Text, paragraph "RNA-Seq Analysis". (XLSX)
RNA-seq (RNA sequencing) uses high-throughput (HTS) data to reveal the presence and quantity of RNA in a biological sample at a given moment in time. In the training available at http://galaxyproject.github.io/RNA-Seq/tutorials/ref_based, we introduce the bioinformatics methods to analyze RNA-seq data using a reference genome. The toy datasets were extracted from the study of Brooks et al. 2011.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-Seq data from a study published by Brooks et al. 2011 to identify genes and exons that are regulated by Pasilla gene.
Comprehensive introduction to the processing and analysis of bulk RNA-seq data including basic information about Illumina-based short read sequencing, common file formats (FASTQ, SAM/BAM, BED, ...) and quality controls. Contains ready-to-use UNIX and R code; covers the most common application of bulk RNA-seq to identify genes that are differentially expressed when comparing two conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data repository for the scMappR manuscript:
Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, they combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.
Even though high-throughput transcriptome sequencing is routinely performed in many laboratories, computational analysis of such data remains a cumbersome process often executed manually, hence error-prone and lacking reproducibility. For corresponding data processing, we introduce Curare, an easy-to-use yet versatile workflow builder for analyzing high-throughput RNA-Seq data focusing on differential gene expression experiments. Data analysis with Curare is customizable and subdivided into preprocessing, quality control, mapping, and downstream analysis stages, providing multiple options for each step while ensuring the reproducibility of the workflow. For a fast and straightforward exploration and visualization of differential gene expression results, we provide the gene expression visualizer software GenExVis. GenExVis can create various charts and tables from simple gene expression tables and DESeq2 results without the requirement to upload data or install software packages.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.
scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
RNA sequencing (RNA-Seq) is a powerful tool that captures information about how organisms respond to stimuli in their environment at the molecular level. A common RNA-Seq approach involves isolating and sequencing all of the messenger RNA (mRNA) in a tissue sample taken from an organism. Researchers can compare patterns observed in RNA-Seq data to understand how individuals respond to the environment over minutes, hours, or days and how populations evolve in response to the environment over millions of years. The materials in this repository will guide users through an analysis of RNA-Seq data collected from two California populations of a copepod crustacean, Tigriopus californicus, that were exposed to different levels of salinity. Users will examine the contents of a fastq file that contains raw RNA-Seq data, determine the quality of the RNA-Seq data using a web-server, and test for significant differences in gene expression between the copepod populations using the R packages DE...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to test the robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data, described in Holland et al. 2020.
The folder data contains raw data and the folder output contains intermediate and final results of all analyses.
The associated analyses code and more information are available on GitHub.
Abstract
Background
Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way.
Results
To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community.
Conclusions
Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.
For questions related to the data please write an email to christian.holland@bioquant.uni-heidelberg.de or use the GitHub issue system.
BackgroundÂ
RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.
Results
With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to...
Background: To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Methods: Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. Results: An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q<0.05, fold change >2) were identified. Expression of selected DEGs (n=32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90% of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). Conclusion: The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system.
The root apex is an important section of the plant root involved in environmental sensing and cellular development. Analyzing the gene profile of root apex in diverse environments is important and challenging especially when the samples are limiting and precious such as in spaceflight. The feasibility of using tiny root sections for transcriptome analysis was examined in this study. To understand the gene expression profiles of the root apex Arabidopsis thaliana Col-0 roots were sectioned into Zone-I (0.5 mm root cap and meristematic zone) and Zone-II (1.5 mm transition elongation and growth terminating zone). Gene expression was analyzed using microarray and RNA seq. Both the techniques arrays and RNA-Seq identified 4180 common genes as differentially expressed (with > two-fold changes) between the zones. In addition 771 unique genes and 19 novel TARs were identified by RNA-Seq as differentially expressed which were not detected in the arrays. Single root tip zones can be used for full transcriptome analysis; further the root apex zones are functionally very distinct from each other. RNA-Seq provided novel information about the transcripts compared to the arrays. These data will help optimize transcriptome techniques for dealing with small rare samples.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global RNA-Seq market is anticipated to reach a value of XXX million by 2033, expanding at a CAGR of XX% during the forecast period of 2025-2033. The market is primarily driven by the increasing prevalence of cancer and other chronic diseases, coupled with the advancements in RNA sequencing technologies. RNA-Seq is a high-throughput sequencing technique that allows researchers to study the expression of all RNA molecules in a cell or tissue sample. This information can be used to identify biomarkers for diseases, develop new therapies, and understand the mechanisms of gene regulation. The key market trends include the growing adoption of next-generation sequencing (NGS) platforms, the development of new RNA-Seq library preparation methods, and the increasing availability of bioinformatics tools. The major players in the RNA-Seq market include Thermo Fisher Scientific, Illumina, BGI, PacBio, Genewiz, Macrogen, LabCorp, Roche, Qiagen, Eurofins, Novo Gene, Berry Genomics, LC Sciences, Canopy Biosciences, Macrogen, and Hologic. The market is fragmented, with the top players accounting for a significant share. The market is expected to witness significant growth in the coming years, driven by the factors mentioned above.
NGS-Based Rna-Seq Market Size 2024-2028
The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.
The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
What will be the Size of the NGS-based RNA-Seq market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample
The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.
Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.
The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.
How is this NGS-based RNA-Seq industry segmented?
The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Acamedic and research centers
Clinical research
Pharma companies
Hospitals
Technology
Sequencing by synthesis
Ion semiconductor sequencing
Single-molecule real-time sequencing
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Singapore
Rest of World (ROW)
.
By End-user Insights
The acamedic and research centers segment is estimated to witness significant growth during the forecast period.
The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification
Simulated RNA-seq data shows that histograms from p value sets with around one hundred true effects out of 20,000 features can be classified as 'uniform'. RNA-seq data was simulated with polyester R package (Frazee, 2015) on 20,000 transcripts from human transcriptome using grid of 3, 6, and 10 replicates and 100, 200, 400, and 800 effects for two groups. Fold changes were set to 0.5 and 2. Differential expression was assessed using DESeq2 R package (Love, 2014) using default settings and group 1 versus group 2 contrast. Effects denotes in facet labels the number of true effects and N denotes number of replicates. Red line denotes QC threshold used for dividing p histograms into discrete classes. Workflow and code used to run this simulation is available on rstats-tartu/simulate-rnaseq. Files de_simulation_results.csv -- merged and processed DE analysis results of simulated data. simulate-reads-2021-01-25.tar.gz -- raw DE analysis results on 20,000 transcripts from human transcriptome using grid of 3, 6, and 10 replicates and 100, 200, 400, and 800 effects for two groups. Fold changes were set to 0.5, 1, and 2. Differential expression was assessed using DESeq2 with default settings. simulate-rnaseq.tar.gz -- snakemake workflow and input fasta file to simulate RNA-seq data with polyester and analyse results with DESeq2. Adjust settings in config.yaml to customise simulation. Includes software to run workflow on Linux, given that Conda and snakemake are installed. The simulate-rnaseq.tar.gz archive can be re-executed on a vanilla machine that only has Conda and Snakemake installed via: tar -xf simulate-rnaseq.tar.gz snakemake --use-conda -n
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
The aim of this work is to determine whether mycobacteria have enhanced virulence during space travel and what mechanisms they use to adapt to microgravity. M. marinum and LHM4 were grown in high aspect ratio vessels (HARV) in a rotary cell culture system (RCCS) under normal gravity (NG) or low shear simulated microgravity (MG). To determine the effect of MG on the stress responses activated by the growth conditions, we used RNAseq to examine what genes were expressed. For RNAseq, the bacteria are harvested, RNA isolated and converted DNA (cDNA), and the cDNA sequenced. Using bioinformatics, the amount of expression of the different M. marinum genes were compared between the NG and MG samples. To make sure that we were examining only gene expression changes due to MG, only bacteria in early exponential growth were used in the RNAseq studies. Triplicate NG and MG cultures were used to generate samples of bacteria grown for ~40 hrs. We also grew triplicate cultures for 4 days and then diluted them again and grew them for another ~40 hrs so we could examine gene expression from bacteria exposed for a longer time. In summary, this study determined that waterborne mycobacteria alter their growth, expression of stress responses, and their sensitivity to oxidizing conditions when subjected to growth under MG.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.