100+ datasets found
  1. RNA-seq example data

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuhin Rana (2023). RNA-seq example data [Dataset]. https://www.kaggle.com/datasets/rana2hin/rna-seq-example-data
    Explore at:
    zip(2193914798 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Tuhin Rana
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Dataset Description

    This dataset contains RNA-seq data from human cells. The data was collected using the Illumina HiSeq 2500 platform. The data includes raw sequencing reads, gene annotations, and phenotypic data for the samples.

    Files and Folders

    Files can be downloaded using the following command:

    wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
    

    Once the file has been downloaded, it can be extracted using the following command:

    tar xvzf chrX_data.tar.gz
    

    This will create a directory called chrX_data containing the following files:

    genes/chrX.gtf
    genome/chrX.fa
    geuvadis_phenodata.csv
    indexes/
    mergelist.txt
    samples/
    

    Here are some additional details about the files in the chrX_data directory:

    • genes/chrX.gtf - This file contains gene annotations for the human X chromosome. It is in the GTF format, which is a standard format for gene annotations. The GTF file contains information about the start and end positions of genes, as well as their transcripts.
    • genome/chrX.fa - This file contains the reference genome sequence for the human X chromosome. It is in the FASTA format, which is a standard format for storing DNA sequences.
    • geuvadis_phenodata.csv - This file contains phenotypic data for the samples in the dataset. The phenotypic data includes information such as the age, sex, and disease status of the samples.
    • indexes/ - This directory contains index files for HISAT2. Index files are used to speed up the alignment of sequencing reads to a reference genome.
    • mergelist.txt - This file lists the samples to be merged. The samples in the samples/ directory can be merged using a variety of tools, such as BEDTools and STAR.
    • samples/ - This directory contains the raw sequencing data. The raw sequencing data is in the FASTQ format, which is a standard format for storing sequencing reads.

    Usage

    This dataset can be used to perform RNA-seq analysis using a variety of tools, such as HISAT2, StringTie, and Ballgown.

    Here are some examples of how this dataset can be used:

    • To identify differentially expressed genes between two groups of samples.
    • To build a gene expression atlas for a particular tissue or cell type.
    • To study the expression of genes involved in a particular disease.

    source: ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz

  2. RNA_Seq_Data_Preprocessing_DGE analysis

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). RNA_Seq_Data_Preprocessing_DGE analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/rna-seq-data-preprocessing-dge-analysis
    Explore at:
    zip(75256 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains RNA-Seq data preprocessing and differential gene expression (DGE) analysis.

    It is designed for researchers, bioinformaticians, and students interested in transcriptomics.

    The dataset includes raw count data and step-by-step preprocessing instructions.

    It demonstrates quality control, normalization, and filtering of RNA-Seq data.

    Differential expression analysis using popular tools and methods is included.

    Results include differentially expressed genes with statistical significance.

    It provides visualizations like PCA plots, heatmaps, and volcano plots.

    The dataset is suitable for learning and reproducing RNA-Seq workflows.

    Both human-readable explanations and code snippets are included for guidance.

    It can serve as a reference for new RNA-Seq projects and research pipelines.

  3. d

    ReCount - A multi-experiment resource of analysis-ready RNA-seq gene count...

    • dknet.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ReCount - A multi-experiment resource of analysis-ready RNA-seq gene count datasets [Dataset]. http://identifiers.org/RRID:SCR_001774
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, they combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.

  4. f

    RNA-seq data analysis summary.

    • datasetcatalog.nlm.nih.gov
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klemm, Paul; Becker, Stephan; Biedenkopf, Nadine; Lechner, Marcus; Weber, Friedemann; Schlereth, Julia; Hartmann, Roland K.; Schoen, Andreas; Kämper, Lennart; Bach, Simone; Demper, Jana-Christin (2021). RNA-seq data analysis summary. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000808954
    Explore at:
    Dataset updated
    Oct 26, 2021
    Authors
    Klemm, Paul; Becker, Stephan; Biedenkopf, Nadine; Lechner, Marcus; Weber, Friedemann; Schlereth, Julia; Hartmann, Roland K.; Schoen, Andreas; Kämper, Lennart; Bach, Simone; Demper, Jana-Christin
    Description

    For methodological details, see S1 Text, paragraph "RNA-Seq Analysis". (XLSX)

  5. DESeq2 DGE Analysis Pasilla RNA-Seq Dataset

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DESeq2 DGE Analysis Pasilla RNA-Seq Dataset [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/deseq2-dge-analysis-pasilla-rna-seq-dataset
    Explore at:
    zip(43449 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains RNA-Seq differential gene expression (DGE) analysis data.

    It is derived from the Pasilla fruit fly dataset.

    The data is processed using DESeq2, a widely-used tool for DGE analysis in R.

    It includes gene counts, normalized counts, and statistical test results.

    Users can explore differentially expressed genes between experimental conditions.

    The dataset is suitable for transcriptomics, bioinformatics, and genomics research.

    It can be used for benchmarking DGE analysis pipelines.

    The dataset provides reproducible examples for learning DESeq2 workflows.

    The source data is publicly available from the original Pasilla RNA-Seq study.

    The dataset can be used to visualize and interpret RNA-Seq results in R.

    It is ideal for researchers, students, and data scientists interested in genomics.

    The dataset helps understand gene expression changes under experimental conditions.

  6. Additional file 2 of Choice of library size normalization and statistical...

    • springernature.figshare.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaohong Li; Nigel Cooper; Timothy O’Toole; Eric Rouchka (2024). Additional file 2 of Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies [Dataset]. http://doi.org/10.6084/m9.figshare.11759040.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Xiaohong Li; Nigel Cooper; Timothy O’Toole; Eric Rouchka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2. Contains datasets used for the analysis. This zipped file folder contains MAQC2 and MAQC3 raw read counts, cancer raw data files filtered with zero counts (AdLC, OC and TNBC), and description of these data files named Supplementary Material.docx.

  7. DGE Analysis GSE153089DESeq2PCA,HeatmapVolcanoPlot

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DGE Analysis GSE153089DESeq2PCA,HeatmapVolcanoPlot [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/dge-analysis-gse153089deseq2pcaheatmapvolcanoplot
    Explore at:
    zip(60250 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a comprehensive Differential Gene Expression (DGE) analysis.

    The analysis is based on the publicly available RNA-seq dataset GSE153089.

    Data processing and statistical analysis were performed using the DESeq2 package in R.

    The dataset includes normalized gene expression values, statistical significance metrics, and fold-change calculations.

    Principal Component Analysis (PCA) was performed to visualize sample clustering and variance.

    Heatmaps were generated to show patterns of differentially expressed genes across samples.

    Volcano plots were created to visualize significantly upregulated and downregulated genes.

    The analysis identifies potential biomarkers and genes of interest for further research.

    This dataset is suitable for bioinformatics, transcriptomics, and molecular biology studies.

    Researchers can use this dataset for reproducible workflows and comparative studies.

    The dataset can aid in understanding gene regulation and expression changes in specific conditions.

    All code used for analysis is provided in a reproducible R script format.

    This resource supports teaching, learning, and benchmarking in RNA-seq analysis.

  8. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Authors
    Hsu, Jonathan; Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  9. n

    Data from: The power and promise of RNA-seq in ecology and evolution

    • data.niaid.nih.gov
    zip
    Updated Jan 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erica Todd; Michael Black; Neil Gemmell; Erica V. Todd; Neil J. Gemmell; Michael A. Black (2016). The power and promise of RNA-seq in ecology and evolution [Dataset]. http://doi.org/10.5061/dryad.vp42s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 6, 2016
    Dataset provided by
    University of Otago
    Authors
    Erica Todd; Michael Black; Neil Gemmell; Erica V. Todd; Neil J. Gemmell; Michael A. Black
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in non-model species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilise biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritise sequencing depth over replication fail to capitalise on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesise progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike.

  10. GSE227516_DGE_Analysis_Visualization_DESeq2

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). GSE227516_DGE_Analysis_Visualization_DESeq2 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/gse227516-dge-analysis-visualization-deseq2
    Explore at:
    zip(1956403 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a comprehensive analysis of differential gene expression (DGE) data.

    The data is processed and visualized using DESeq2, a widely used R package for RNA-seq analysis.

    It includes normalized counts, statistical results, and visualization plots.

    Provides insights into gene expression changes across different experimental conditions.

    Facilitates downstream bioinformatics analysis and interpretation.

    Includes ready-to-use scripts for performing DGE analysis and generating publication-quality plots.

    Designed for researchers, bioinformaticians, and students working on transcriptomics.

    Supports reproducible research practices with fully documented code.

    The dataset is derived from GSE227516, a public RNA-seq dataset.

    Suitable for learning, demonstration, and comparative analysis of gene expression workflows.

  11. e

    Data from: RNA-seq differential expression studies: more sequence, or more...

    • ebi.ac.uk
    Updated Jan 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Zhou; Yuwen Liu; Kevin White (2014). RNA-seq differential expression studies: more sequence, or more replication? [Dataset]. https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-GEOD-51403
    Explore at:
    Dataset updated
    Jan 6, 2014
    Authors
    Jie Zhou; Yuwen Liu; Kevin White
    Description

    Motivation: RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources. Results: We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE) genes. In the human cell line MCF-7, adding more sequencing depth after 10M reads gives diminishing returns on power to detect DE genes, while adding biological replicates improves power significantly regardless of sequencing depth. We also propose a cost-effectiveness metric for guiding the design of large scale RNA-seq DE studies. Our analysis showed that sequencing less reads and perform more biological replication is an effective strategy to increase power and accuracy in large scale differential expression RNA-seq studies, and provided new insights into efficient experiment design of RNA-seq studies Treatment (10nM E2 treatment for 24h) and control MCF7 cells are both replicated 7 times, and collected for mRNA-seq. Reads are then subsampled for statistical analysis.

  12. Additional file 3 of Choice of library size normalization and statistical...

    • figshare.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaohong Li; Nigel Cooper; Timothy O’Toole; Eric Rouchka (2024). Additional file 3 of Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies [Dataset]. http://doi.org/10.6084/m9.figshare.11759052.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Xiaohong Li; Nigel Cooper; Timothy O’Toole; Eric Rouchka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3. R code for the analysis.

  13. Raw data bulkRNAseq.csv

    • figshare.com
    txt
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Grünschläger; Dominik Vonficht (2022). Raw data bulkRNAseq.csv [Dataset]. http://doi.org/10.6084/m9.figshare.19425302.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 28, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Florian Grünschläger; Dominik Vonficht
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bulk RNA-seq data (smartseq2; raw freature counts) of naive murine CD4+ T cells co-cultured with murine HSPCs (THSPC), or with murine DCs (TDC), or murine LSKs as control condition, in the presence or absence of antigen (ova,ctrl)

  14. f

    RNA-sequencing data.

    • datasetcatalog.nlm.nih.gov
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davidson, Bradley; Pickett, C. J.; Gruner, Hannah N. (2024). RNA-sequencing data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001449690
    Explore at:
    Dataset updated
    Jan 25, 2024
    Authors
    Davidson, Bradley; Pickett, C. J.; Gruner, Hannah N.
    Description

    (Tab 1) DESeq2 statistics for all genes. (Tab 2–6) Lists of up- and down-regulated genes from Fig 6. First column lists the KH gene IDs for all genes that were selected for a particular analysis shown in each figure panel. The KH gene IDs for all relevant genes that were significantly up- or down-regulated are shown in blue or red font. Boxes indicate the most highly impact subset of genes as shown in the Fig 6 tables. For these genes KH IDs were paired with KY IDs and human homologs. (Tab 7) Referenced Foxf target genes. (XLSX)

  15. s

    Data from: Transcriptomic analysis reveals pro-inflammatory signatures...

    • figshare.scilifelab.se
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Holmfeldt; Svea Stratmann (2025). Data from: Transcriptomic analysis reveals pro-inflammatory signatures associated with acute myeloid leukemia progression [Dataset]. http://doi.org/10.17044/scilifelab.13105229.v1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala Universitet
    Authors
    Linda Holmfeldt; Svea Stratmann
    License

    https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

    Description

    Data Set Description

    These data are collected from a total of 70 participants (47 adult; 23 pediatric), all of which had relapsed or primary resistant acute myeloid leukemia. The data, which here are separated into an adult and a pediatric dataset, were generated as part of a study by Stratmann et. al. (https://doi.org/10.1182/bloodadvances.2021004962). The Stratmann et. al. study is currently pre-published here: https://ashpublications.org/bloodadvances/article/doi/10.1182/bloodadvances.2021004962/477210/Transcriptomic-analysis-reveals-pro-inflammatory Please note that separate applications are necessary for the adult and pediatric dataset, respectively. When applying for access, please indicate which of the datasets that the application applies for. The adult dataset contains transcriptome sequencing (RNA-seq) data from 25 diagnosis (D), 45 relapse (R1/R2/R3) and five (5) primary resistant (PR) leukemic samples from 47 patients, as well as five (5) normal CD34+ bone marrow control samples. The pediatric dataset contains RNA-seq data from 18 diagnosis (D), 22 relapse (R1/R2), six (6) persistent relapse (R1/2-P) and one (1) primary resistant (PR) leukemic samples from 23 patients, as well as five (5) normal CD34+ bone marrow control samples. The leukemic samples originate from bone marrow or peripheral blood. The normal RNA samples originate from purified CD34+ bone marrow cells from five different healthy individuals. Further details regarding the samples are available in the Supplemental Information part of Stratmann et. al. (https://doi.org/10.1182/bloodadvances.2021004962). RNA-seq libraries and associated next-generation sequencing were carried out by the SNP&SEQ Technology platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. Libraries were prepared using the TruSeq stranded total RNA library preparation kit with ribosomal depletion by RiboZero Gold (Illumina). Sequencing of adult samples was carried out on the Illumina HiSeq2500 platform, generating paired-end 125bp reads using v4 sequencing chemistry. Sequencing of pediatric samples was carried out on the Illumina NovaSeq6000 platform (S2 flowcell), generating paired-end 100bp reads using the v1 sequencing chemistry. The CD34+ bone marrow control samples were sequenced using both platforms (Illumina HiSeq2500 and NovaSeq6000). Further, all of these acute myeloid leukemia samples have also been characterized by whole genome sequencing or whole exome sequencing, with the datasets available under controlled access through doi.org/10.17044/scilifelab.12292778. Terms for accessThe adult and pediatric datasets are only to be used for research that is seeking to advance the understanding of the influence of genetic and transcriptomic factors on human acute myeloid leukemia etiology and biology. Use of the protected pediatric dataset is only for research projects that can merely be conducted using pediatric acute myeloid leukemia data, and for which the research objectives cannot be accomplished using data from adults. Applications intending various method development would thus not be considered as acceptable for use of the pediatric dataset. Further, the pediatric dataset may not be used for research investigating predisposition for acute myeloid leukemia based on germline variants.

    For conditional access to the adult and/or pediatric dataset in this publication, please contact datacentre@scilifelab.se

  16. n

    BioXpress

    • neuinfo.org
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). BioXpress [Dataset]. http://identifiers.org/RRID:SCR_014191
    Explore at:
    Dataset updated
    Oct 4, 2024
    Description

    BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. BioXpress can be searched by gene name or cancer type. To search the database by gene name, select the appropriate identifier type from the dropdown menu and type in the corresponding identifier in the adjacent text box. The results are computed and presented to the user with information such as variable expression levels and tumor expression. To search by cancer type, select the desired type from the dropdown menu, such as "Cancer Type", "Significant", "Expression", "Adjusted p-value" and "p-value". Results are shown in a graph displaying the top 10 differentially expressed genes for the specified cancer type in terms of the frequency of significant altered expression between the tumor and normal pairs.

  17. f

    Aligned RNA-seq expression data and differential gene expression analyses...

    • datasetcatalog.nlm.nih.gov
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milinkovitch, Michel C.; Cooper, Rory L. (2025). Aligned RNA-seq expression data and differential gene expression analyses for each developmental stage are provided in supporting information file S1 Data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002038947
    Explore at:
    Dataset updated
    Mar 20, 2025
    Authors
    Milinkovitch, Michel C.; Cooper, Rory L.
    Description

    The entire raw RNA-seq dataset is provided at https://doi.org/10.5061/dryad.8pk0p2nzj. (ZIP)

  18. u

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • agdatacommons.nal.usda.gov
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

    matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

    *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

    nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  19. f

    RNA sequencing data.

    • datasetcatalog.nlm.nih.gov
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagano, Keiji; Iijima, Masahiro; Miyakawa, Hiroshi; Fujita, Mari; Yokogawa, Tadaharu (2022). RNA sequencing data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000320483
    Explore at:
    Dataset updated
    Jun 24, 2022
    Authors
    Nagano, Keiji; Iijima, Masahiro; Miyakawa, Hiroshi; Fujita, Mari; Yokogawa, Tadaharu
    Description

    Results of two independent RNA sequencing experiments of OG and MT and their statistical analyses. (XLSX)

  20. e

    Data from: Differential analyses for RNA-seq: transcript-level estimates...

    • ebi.ac.uk
    Updated Dec 9, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Robinson (2015). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-4119
    Explore at:
    Dataset updated
    Dec 9, 2015
    Authors
    Mark Robinson
    Description

    High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomics studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads overlapping given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of both performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices, and incorporation of transcript-level abundance estimates improves the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package (tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tuhin Rana (2023). RNA-seq example data [Dataset]. https://www.kaggle.com/datasets/rana2hin/rna-seq-example-data
Organization logo

RNA-seq example data

expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
zip(2193914798 bytes)Available download formats
Dataset updated
Jun 16, 2023
Authors
Tuhin Rana
License

https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

Description

Dataset Description

This dataset contains RNA-seq data from human cells. The data was collected using the Illumina HiSeq 2500 platform. The data includes raw sequencing reads, gene annotations, and phenotypic data for the samples.

Files and Folders

Files can be downloaded using the following command:

wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz

Once the file has been downloaded, it can be extracted using the following command:

tar xvzf chrX_data.tar.gz

This will create a directory called chrX_data containing the following files:

genes/chrX.gtf
genome/chrX.fa
geuvadis_phenodata.csv
indexes/
mergelist.txt
samples/

Here are some additional details about the files in the chrX_data directory:

  • genes/chrX.gtf - This file contains gene annotations for the human X chromosome. It is in the GTF format, which is a standard format for gene annotations. The GTF file contains information about the start and end positions of genes, as well as their transcripts.
  • genome/chrX.fa - This file contains the reference genome sequence for the human X chromosome. It is in the FASTA format, which is a standard format for storing DNA sequences.
  • geuvadis_phenodata.csv - This file contains phenotypic data for the samples in the dataset. The phenotypic data includes information such as the age, sex, and disease status of the samples.
  • indexes/ - This directory contains index files for HISAT2. Index files are used to speed up the alignment of sequencing reads to a reference genome.
  • mergelist.txt - This file lists the samples to be merged. The samples in the samples/ directory can be merged using a variety of tools, such as BEDTools and STAR.
  • samples/ - This directory contains the raw sequencing data. The raw sequencing data is in the FASTQ format, which is a standard format for storing sequencing reads.

Usage

This dataset can be used to perform RNA-seq analysis using a variety of tools, such as HISAT2, StringTie, and Ballgown.

Here are some examples of how this dataset can be used:

  • To identify differentially expressed genes between two groups of samples.
  • To build a gene expression atlas for a particular tissue or cell type.
  • To study the expression of genes involved in a particular disease.

source: ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz

Search
Clear search
Close search
Google apps
Main menu