100+ datasets found

Example RNA-seq analysis of data from GSE119855
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geert van Geest; Geert van Geest (2023). Example RNA-seq analysis of data from GSE119855 [Dataset]. http://doi.org/10.5281/zenodo.7710786
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7710786
Dataset updated
Mar 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Geert van Geest; Geert van Geest
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline
f
Data from: A Robust Analytical Pipeline for Genome-Wide Identification of...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Sep 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kojima, Takaaki; Kobayashi, Tetsuo; Ihara, Kunio; Nakano, Hideo; Kunitake, Emi (2016). A Robust Analytical Pipeline for Genome-Wide Identification of the Genes Regulated by a Transcription Factor: Combinatorial Analysis Performed Using gSELEX-Seq and RNA-Seq [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001503876
Explore at:
Dataset updated
Sep 28, 2016
Authors
Kojima, Takaaki; Kobayashi, Tetsuo; Ihara, Kunio; Nakano, Hideo; Kunitake, Emi
Description
For identifying the genes that are regulated by a transcription factor (TF), we have established an analytical pipeline that combines genomic systematic evolution of ligands by exponential enrichment (gSELEX)-Seq and RNA-Seq. Here, SELEX was used to select DNA fragments from an Aspergillus nidulans genomic library that bound specifically to AmyR, a TF from A. nidulans. High-throughput sequencing data were obtained for the DNAs enriched through the selection, following which various in silico analyses were performed. Mapping reads to the genome revealed the binding motifs including the canonical AmyR-binding motif, CGGN8CGG, as well as the candidate promoters controlled by AmyR. In parallel, differentially expressed genes related to AmyR were identified by using RNA-Seq analysis with samples from A. nidulans WT and amyR deletant. By obtaining the intersecting set of genes detected using both gSELEX-Seq and RNA-Seq, the genes directly regulated by AmyR in A. nidulans can be identified with high reliability. This analytical pipeline is a robust platform for comprehensive genome-wide identification of the genes that are regulated by a target TF.
f
Additional file 3 of SPEAQeasy: a scalable pipeline for expression analysis...
datasetcatalog.nlm.nih.gov
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Emily E.; Eagles, Nicholas J.; Aguilar-Ordoñez, Israel; Leonard, Jacob; Serrato, Violeta Larios; Jaffe, Andrew E.; Phan, BaDoi N.; Barry, Brianna K.; Collado-Torres, Leonardo; Gutiérrez-Millán, Everardo; Stolz, Joshua M.; Huuki, Louise (2021). Additional file 3 of SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000798202
Explore at:
Dataset updated
May 2, 2021
Authors
Burke, Emily E.; Eagles, Nicholas J.; Aguilar-Ordoñez, Israel; Leonard, Jacob; Serrato, Violeta Larios; Jaffe, Andrew E.; Phan, BaDoi N.; Barry, Brianna K.; Collado-Torres, Leonardo; Gutiérrez-Millán, Everardo; Stolz, Joshua M.; Huuki, Louise
Description
Additional file 3: Table S1. Available configuration profiles. Configuration files exist under the SPEAQeasy/conf directory. Configuration profiles exist for SGE and SLURM clusters, as well as local execution on a Linux machine. These profiles can be customized for specific clusters, such as the JHPCE configuration file jhpce.config, which runs on an SGE cluster. The file a user chooses also depends on whether software dependencies are managed with docker, or are installed locally. Table S2. SPEAQeasy output files. Table of intermediary outputs generated by SPEAQeasy. These do not include the major output files of interest (Fig. 2), but other miscellaneous outputs from each processing step. In the Filename column, brackets denote one or more values dependent on a relevant variable; for example, the files [sample_name]_process_trace.log refer to a set of several files, each named distinctly according to the sample associated with the particular file. An asterisk represents a wildcard matching more than one file, when individual file names may depend on the experiment. For example, [sample_name]_trimmed*.fastq could refer to sample1_trimmed_1.fastq and sample1_trimmed_2.fastq. The next columns provide the directory containing each given file, relative to the output folder, and a description of the files’ content, respectively. Table S3. Quality metrics recorded in SPEAQeasy outputs. One of the major pipeline outputs is a comma-separated values (CSV) file where fields (columns) are different quality metrics, and each line (row) is associated with one sample. A list of the exact field names and their descriptions is given above. Table S4. SPEAQeasy-example differential expression and gene ontology results. (A) Differential expression results using the subset of BipSeq data analyzed in http://research.libd.org/SPEAQeasy-example/ . (B) Gene ontology enrichment results from the genes with a p-value < 0.005 in the differential expression results between bipolar disorder affected individuals and neurotypical controls. Table S5. Pipeline comparison. A comparison of usage-related features among several publicly available RNA-seq pipelines.
MOUtbhR RNA-seq quantitative analysis
figshare.com
pdf
Updated Apr 3, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
laurent manchon; Jamal Tazi (2017). MOUtbhR RNA-seq quantitative analysis [Dataset]. http://doi.org/10.6084/m9.figshare.4811035.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4811035.v1
Dataset updated
Apr 3, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
laurent manchon; Jamal Tazi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RNA-seq pipeline analysis of mouse single end libraries.
Efficient Identification of Multiple Pathways: RNA-Seq Analysis of Livers...
data.nasa.gov
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Efficient Identification of Multiple Pathways: RNA-Seq Analysis of Livers from 56Fe Ion Irradiated Mice Followers 0 --> [Dataset]. https://data.nasa.gov/dataset/efficient-identification-of-multiple-pathways-rna-seq-analysis-of-livers-from-56fe-ion-irr
Explore at:
Dataset updated
Apr 23, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.
Raw and processed (filtered and annotated) scRNAseq data
figshare.com
zip
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23499192.v1
Dataset updated
Jun 12, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
Scanpy Pipeline GSE145926 HDF5 Ingestion Plotly
kaggle.com
zip
Updated Dec 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Scanpy Pipeline GSE145926 HDF5 Ingestion Plotly [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/scanpy-pipeline-gse145926-hdf5-ingestion-plotly
Explore at:
zip(4663836 bytes)Available download formats
Dataset updated
Dec 4, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains single-cell RNA sequencing (scRNA-seq) data processed using the Scanpy pipeline.

It focuses on the GSE145926 dataset from publicly available sources.

The data has been ingested and stored in HDF5 format for easy access and manipulation.

It includes pre-processed expression matrices suitable for downstream analysis.

The dataset enables exploratory analysis using Plotly interactive visualizations.

It allows researchers to examine gene expression patterns at single-cell resolution.

Includes metadata annotations for cell types and experimental conditions.

Facilitates differential expression analysis and cell clustering investigations.

Supports visualization of key immune markers such as CD3E across cell populations.

Designed for bioinformaticians, computational biologists, and immunology researchers.

Provides an end-to-end demonstration of Scanpy workflow in Python.

Enables reproducibility and further expansion for custom analyses.
r
Data from: GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type...
resodate.org
service.tib.eu
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecily Wolfe; Yayi Feng; David Chen; Edwin Purcell; Anne Talkington; Sepideh Dolatshahi; Heman Shakeri (2025). GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type Identification [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvZ2VvdHlwZXItLWF1dG9tYXRlZC1waXBlbGluZS1mcm9tLXJhdy1zY3JuYS1zZXEtZGF0YS10by1jZWxsLXR5cGUtaWRlbnRpZmljYXRpb24=
Explore at:
Dataset updated
Jan 3, 2025
Dataset provided by
Leibniz Data Manager
Authors
Cecily Wolfe; Yayi Feng; David Chen; Edwin Purcell; Anne Talkington; Sepideh Dolatshahi; Heman Shakeri
Description
A standardized pipeline for processing scRNA-seq data, integrating multiple tools for data wrangling, visualization, cell type identification, and analysis of changes in cellular activity.
CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)
zenodo.org
explore.openaire.eu
+3more
bin, zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.17632/xnwncxpw42.1
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.

The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).

SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.

The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.

In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

Steps to reproduce

To build the research object again, use Python 3 on macOS. Built with:

Processor 2.8GHz Intel Core i7

Memory: 16GB

OS: macOS High Sierra, Version 10.13.3

Storage: 250GB

Install cwltool

pip3 install cwltool==1.0.20180912090223

Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs

Get the data and make the analysis environment ready:

git clone https://github.com/FarahZKhan/cwl_workflows.git cd cwl_workflows/ git checkout CWLProvTesting ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh

Run the following commands to create the CWLProv Research Object:

cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256

The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
f
Data from: Informatics for RNA Sequencing: A Web Resource for Analysis on...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spies, Nicholas C.; Walker, Jason R.; Griffith, Malachi; Griffith, Obi L.; Ainscough, Benjamin J. (2015). Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001851447
Explore at:
Dataset updated
Aug 6, 2015
Authors
Spies, Nicholas C.; Walker, Jason R.; Griffith, Malachi; Griffith, Obi L.; Ainscough, Benjamin J.
Description
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.
Single Cell RNA Seq Analysis QC Clustering PBMC 3k
kaggle.com
zip
Updated Dec 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Single Cell RNA Seq Analysis QC Clustering PBMC 3k [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/single-cell-rna-seq-analysis-qc-clustering-pbmc-3k
Explore at:
zip(29203448 bytes)Available download formats
Dataset updated
Dec 4, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains processed single-cell RNA-sequencing (scRNA-seq) data from the PBMC 3K experiment.

It includes quality-control (QC) visualizations, cell-level metrics, clustering outputs, and exploratory analysis plots.

The dataset is designed to guide beginners and intermediate users through the essential steps of scRNA-seq preprocessing and analysis.

The PBMC 3K dataset represents human peripheral blood mononuclear cells sequenced using the 10x Genomics platform.

Included QC metrics help identify low-quality cells, doublets, stressed cells, and outliers based on standard thresholds.

The dataset covers filtering based on mitochondrial gene percentage, total UMIs, and number of detected genes.

All plots follow widely accepted scRNA-seq workflows commonly used in tools like Seurat, Scanpy, and SingleCellExperiment.

The QC violin plots illustrate distributions of nFeature_RNA, nCount_RNA, percent.mt, and other metrics used to assess cell quality.

The data also highlights the effect of filtering on overall dataset structure and variability.

Clustering-related files provide a visual understanding of how cells segregate into biologically meaningful groups.

Dimensionality-reduction plots also show patterns such as immune-cell diversity present in PBMC populations.

This dataset is suitable for hands-on learning, tutorial creation, classroom instruction, or benchmarking workflows.

It serves as a ready reference for researchers who wish to practice QC interpretation and cluster inspection.

The dataset allows quick reproduction of PBMC 3K quality-control visualizations without running the entire analysis pipeline.

It provides an accessible introduction to scRNA-seq analysis concepts for students, data scientists, and bioinformaticians.
Global Ngs-Based Rna-Seq Growth Analysis - Size and Forecast 2024 - 2028 |...
technavio.com
pdf
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Global Ngs-Based Rna-Seq Growth Analysis - Size and Forecast 2024 - 2028 | Technavio [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Aug 15, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Description
snapshot-tab-pane NGS-Based Rna-Seq Market Size 2024-2028The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.What will be the Size of the NGS-based RNA-Seq market during the forecast period?Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report. Request Free SampleThe market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.How is this NGS-based RNA-Seq industry segmented?The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in "USD million" for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.End-user Acamedic and research centersClinical researchPharma companiesHospitalsTechnology Sequencing by synthesisIon semiconductor sequencingSingle-molecule real-time sequencingOthersGeography North America USEurope GermanyUKAPAC ChinaSingaporeRest of World (ROW) . By End-user InsightsThe acamedic and research centers segment is estimated to witness significant growth during the forecast period.The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identificati
Results of "Curare and GenExVis: A versatile toolkit for analyzing and...
zenodo.org
data.niaid.nih.gov
zip
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Blumenkamp; Patrick Blumenkamp; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke; Alexander Goesmann; Alexander Goesmann; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke (2024). Results of "Curare and GenExVis: A versatile toolkit for analyzing and visualizing RNA-Seq data" [Dataset]. http://doi.org/10.5281/zenodo.10362480
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10362480
Dataset updated
2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Blumenkamp; Patrick Blumenkamp; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke; Alexander Goesmann; Alexander Goesmann; Max Pfister; Sonja Diedrich; Karina Brinkrolf; Sebastian Jaenicke
Description
Even though high-throughput transcriptome sequencing is routinely performed in many laboratories, computational analysis of such data remains a cumbersome process often executed manually, hence error-prone and lacking reproducibility. For corresponding data processing, we introduce Curare, an easy-to-use yet versatile workflow builder for analyzing high-throughput RNA-Seq data focusing on differential gene expression experiments. Data analysis with Curare is customizable and subdivided into preprocessing, quality control, mapping, and downstream analysis stages, providing multiple options for each step while ensuring the reproducibility of the workflow. For a fast and straightforward exploration and visualization of differential gene expression results, we provide the gene expression visualizer software GenExVis. GenExVis can create various charts and tables from simple gene expression tables and DESeq2 results without the requirement to upload data or install software packages.
DESeq2 DGE Analysis Pasilla RNA-Seq Dataset
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). DESeq2 DGE Analysis Pasilla RNA-Seq Dataset [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/deseq2-dge-analysis-pasilla-rna-seq-dataset
Explore at:
zip(43449 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains RNA-Seq differential gene expression (DGE) analysis data.

It is derived from the Pasilla fruit fly dataset.

The data is processed using DESeq2, a widely-used tool for DGE analysis in R.

It includes gene counts, normalized counts, and statistical test results.

Users can explore differentially expressed genes between experimental conditions.

The dataset is suitable for transcriptomics, bioinformatics, and genomics research.

It can be used for benchmarking DGE analysis pipelines.

The dataset provides reproducible examples for learning DESeq2 workflows.

The source data is publicly available from the original Pasilla RNA-Seq study.

The dataset can be used to visualize and interpret RNA-Seq results in R.

It is ideal for researchers, students, and data scientists interested in genomics.

The dataset helps understand gene expression changes under experimental conditions.
f
Additional file 2: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes Kรถster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long (2023). Additional file 2: of VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6138290.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6138290.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Authors
MacIntosh Cornwell; Mahesh Vangala; Len Taing; Zachary Herbert; Johannes Kรถster; Bo Li; Hanfei Sun; Taiwen Li; Jian Zhang; Xintao Qiu; Matthew Pun; Rinath Jeselsohn; Myles Brown; X. Liu; Henry Long
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metasheet Example (CSV 600 bytes)
f
Additional file 2 of SPEAQeasy: a scalable pipeline for expression analysis...
datasetcatalog.nlm.nih.gov
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aguilar-Ordoñez, Israel; Eagles, Nicholas J.; Stolz, Joshua M.; Serrato, Violeta Larios; Burke, Emily E.; Barry, Brianna K.; Gutiérrez-Millán, Everardo; Jaffe, Andrew E.; Leonard, Jacob; Phan, BaDoi N.; Huuki, Louise; Collado-Torres, Leonardo (2021). Additional file 2 of SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000798200
Explore at:
Dataset updated
May 2, 2021
Authors
Aguilar-Ordoñez, Israel; Eagles, Nicholas J.; Stolz, Joshua M.; Serrato, Violeta Larios; Burke, Emily E.; Barry, Brianna K.; Gutiérrez-Millán, Everardo; Jaffe, Andrew E.; Leonard, Jacob; Phan, BaDoi N.; Huuki, Louise; Collado-Torres, Leonardo
Description
Additional file 2: Figure S2: SPEAQeasy logs tracing computational steps by sample. To aid transparency and greatly simplify the source of execution errors, SPEAQeasy automatically generates logs with several pieces of information for every sample. In order of submission, the name of each Nextflow process is printed, along with (1) the working directory: where all relevant files are present, (2) the exit code: a standard indication of whether the process succeeded or how it failed, (3) a list of the specific commands run during the given process. Above is a snapshot of the top of an example log
Development of a pipeline for analyzing the gene expression profile of...
zenodo.org
data.niaid.nih.gov
zip
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julia de Pietro Bigi; Julia de Pietro Bigi (2023). Development of a pipeline for analyzing the gene expression profile of transcripts [Dataset]. http://doi.org/10.5281/zenodo.8216030
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8216030
Dataset updated
Aug 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julia de Pietro Bigi; Julia de Pietro Bigi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data corresponding to the scientific initiation project in bioinformatics.

The data in the parcial_report_input folder are equivalent to raw counts, metadata and data of FPKM values of genes from the analysis performed during the first six months of the project. They correspond the counts and metadata from a previous study from renal cell carcinoma. As for the Final_report_input, it also contains counts and metadata, but from a previous study of ostesarcoma. The metadata and raw data files can be found under the accesion number hs000699.v1.p1 in dbGAP. The scripts wrote to perform pre-processing of samples, differential expression analysis, network analysis and functional annotation can be found in GitHub repository.
RNAseq analysis of the response of Arabidopsis thaliana to fractional...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/rnaseq-analysis-of-the-response-of-arabidopsis-thaliana-to-fractional-gravity-under-blue-l
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Traveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth's gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms, under partial gravity conditions. Here, we present results from our latest space experiments on the ISS, in which seeds of Arabidopsis thaliana were germinated, and seedlings grew for six days under different gravity levels, namely micro-g, several intermediate partial-g levels, and 1g, and were subjected to irradiation with blue light for the last 48 hours. RNA was extracted from 20 samples for subsequent RNAseq analysis. Transcriptomic analysis was performed using the HISAT2-Stringtie-DESeq pipeline. Differentially expressed genes were further characterized for global responses using the GEDI tool, gene networks and for Gene Ontology (GO) enrichment.
f
DataSheet1_Fusion InPipe, an integrative pipeline for gene fusion detection...
datasetcatalog.nlm.nih.gov
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vega-García, Nerea; Vicente-Garcés, Clara; Maynou, Joan; Rives, Susana; Català, Albert; Camós, Mireia; Torrebadell, Montserrat; Esperanza-Cebollada, Elena; Fernández, Guerau (2023). DataSheet1_Fusion InPipe, an integrative pipeline for gene fusion detection from RNA-seq data in acute pediatric leukemia.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000942983
Explore at:
Dataset updated
Jun 9, 2023
Authors
Vega-García, Nerea; Vicente-Garcés, Clara; Maynou, Joan; Rives, Susana; Català, Albert; Camós, Mireia; Torrebadell, Montserrat; Esperanza-Cebollada, Elena; Fernández, Guerau
Description
RNA sequencing (RNA-seq) is a reliable tool for detecting gene fusions in acute leukemia. Multiple bioinformatics pipelines have been developed to analyze RNA-seq data, but an agreed gold standard has not been established. This study aimed to compare the applicability of 5 fusion calling pipelines (Arriba, deFuse, CICERO, FusionCatcher, and STAR-Fusion), as well as to define and develop an integrative bioinformatics pipeline (Fusion InPipe) to detect clinically relevant gene fusions in acute pediatric leukemia. We analyzed RNA-seq data by each pipeline individually and by Fusion InPipe. Each algorithm individually called most of the fusions with similar sensitivity and precision. However, not all rearrangements were called, suggesting that choosing a single pipeline might cause missing important fusions. To improve this, we integrated the results of the five algorithms in just one pipeline, Fusion InPipe, comparing the output from the agreement of 5/5, 4/5, and 3/5 algorithms. The maximum sensitivity was achieved with the agreement of 3/5 algorithms, with a global sensitivity of 95%, achieving a 100% in patients’ data. Furthermore, we showed the necessity of filtering steps to reduce the false positive detection rate. Here, we demonstrate that Fusion InPipe is an excellent tool for fusion detection in pediatric acute leukemia with the best performance when selecting those fusions called by at least 3/5 pipelines.
d
Data from: Base editing strategies to convert CAG to CAA diminish the...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jong-Min Lee (2024). Base editing strategies to convert CAG to CAA diminish the disease-causing mutation in Huntington's disease [Dataset]. http://doi.org/10.5061/dryad.k3j9kd5cb
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.k3j9kd5cb
Dataset updated
May 31, 2024
Dataset provided by
Dryad
Authors
Jong-Min Lee
Time period covered
May 2, 2024
Description
Base editing strategies to convert CAG to CAA in Huntington's disease

HD.BE.RNAseq.Meta.Data.230116.csv: Sample characteristics and meta-data

Description of columns in the metadata file

Sample: Name of the sample for each RNAseq sample

Cell: HEK293 cells that were used for RNAseq analysis

Group: experimental group including empty vector-treated controls (n=4), gRNA 1-tretaed samples (n=4), and gRNA2-treated samples (n=4)

Replicate: replicate number

PC1: principal component 1 value

PC2: principal component 2 value

PC3: principal component 3 value

PC4: principal component 4 value

PC5: principal component 5 value

PC6: principal component 6 value

PC7: principal component 7 value

PC8: principal component 8 value

PC9: principal component 9 value

PC10: principal component 10 value

PC11: principal component 11 value

PC12: principal component 12 value

HD.BE.RNAseq.12.Sample.230116.txt: RNAseq expression data