100+ datasets found
  1. b

    Bgee gene expression data

    • bgee.org
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Bgee Team (2024). Bgee gene expression data [Dataset]. https://www.bgee.org
    Explore at:
    Dataset updated
    May 21, 2024
    Dataset authored and provided by
    The Bgee Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question -where is a gene expressed?- and supports research in cancer and agriculture, as well as evolutionary biology.

  2. p

    Human Protein Atlas - Tissue

    • proteinatlas.org
    • v25.proteinatlas.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Protein Atlas, Human Protein Atlas - Tissue [Dataset]. https://www.proteinatlas.org/humanproteome/tissue
    Explore at:
    Dataset authored and provided by
    Human Protein Atlas
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    Tissue methods

    This resource of the Human Protein Atlas focuses on the expression profiles in human tissues of genes both on the mRNA and protein level. The protein expression data from 45 normal human tissue types is derived from antibody-based protein profiling using conventional and multiplex immunohistochemistry. All underlying images of immunohistochemistry stained normal tissues are available together with knowledge-based annotation of protein expression levels. The protein data covers 15312 genes (76%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 51 different normal tissue types. 
    

    More information about the specific content and the generation and analysis of the data in the resource can be found on the Methods Summary. Learn about:

    protein localization in tissues at a single-cell level if a gene is enriched in a particular tissue (specificity) which genes have a similar expression profile across tissues (expression cluster)

  3. Human Gene Expression Database Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Human Gene Expression Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/human-gene-expression-database-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.

  4. d

    Bgee: dataBase for Gene Expression Evolution

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Bgee: dataBase for Gene Expression Evolution [Dataset]. http://identifiers.org/RRID:SCR_002028
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database to retrieve and compare gene expression patterns between animal species. Bgee first maps heterogeneous expression data (currently bulk RNA-Seq, scRNA-Seq, Affymetrix, in situ hybridization, and EST data) to anatomy and development of different species. Bgee is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of gene expression.

  5. p

    Human Protein Atlas - Brain

    • proteinatlas.org
    • v25.proteinatlas.org
    Updated Sep 18, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Protein Atlas (2017). Human Protein Atlas - Brain [Dataset]. https://www.proteinatlas.org/humanproteome/brain
    Explore at:
    Dataset updated
    Sep 18, 2017
    Dataset authored and provided by
    Human Protein Atlas
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    Brain methods

    This resource provides comprehensive spatial profiling of the Brain, including overview of protein expression in the mammalian brain based on integration of data from human, pig and mouse. Transcriptomics data combined with affinity-based protein in situ localization down to single cell detail is available in this brain-centric sub atlas of the Human Protein Atlas. The data presented are for human genes and their one-to-one orthologues in pig and mouse. Gene summary pages provide the hierarchical expression landscape form 13 main regions of the brain to individual nuclei and subfields for every protein coding gene. For selected proteins, high content images are available to explore the cellular and subcellular protein distribution. In addition, the Brain resource contains lists of genes with elevated expression in one or a group of regions to help the user identify unique protein expression profiles linked to physiology and function. 
    

    More information about the specific content and the generation and analysis of the data in this resource can be found on the Methods Summary. Learn about:

    Expression levels for all human proteins in regions and subregions of the human brain Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs Regional enriched genes with higher expression in a single or few regions of the brain Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain Differences in gene expression between mammalian species

    Additional information: In addition to the data provided in the brain resource there is also data on human retina and single cell data containing information on protein expression in human neuronal and non-neuronal cell-types in the central nervous system.

  6. u

    Data from: Plant Expression Database

    • agdatacommons.nal.usda.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    bin
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson (2024). Plant Expression Database [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Plant_Expression_Database/24661179
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    PLEXdb
    Authors
    Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    [NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.

  7. Data from: Comparing RNA-Seq and microarray gene expression data in two...

    • data.nasa.gov
    • osdr.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Comparing RNA-Seq and microarray gene expression data in two zones of the Arabidopsis root apex relevant to spaceflight. [Dataset]. https://data.nasa.gov/dataset/comparing-rna-seq-and-microarray-gene-expression-data-in-two-zones-of-the-arabidopsis-root
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Premise of the study: The root apex is an important region involved in environmental sensing, but comprises a very small part of the root. Obtaining root apex transcriptomes is therefore challenging when the samples are limited. The feasibility of using tiny root sections for transcriptome analysis was examined, comparing RNA sequencing (RNA-Seq) to microarrays in characterizing genes that are relevant to spaceflight.Methods:Arabidopsis thaliana Columbia ecotype (Col-0) roots were sectioned into Zone 1 (0.5 mm; root cap and meristematic zone) and Zone 2 (1.5 mm; transition, elongation, and growth-terminating zone). Differential gene expression in each was compared.Results: Both microarrays and RNA-Seq proved applicable to the small samples. A total of 4180 genes were differentially expressed (with fold changes of 2 or greater) between Zone 1 and Zone 2. In addition, 771 unique genes and 19 novel transcriptionally active regions were identified by RNA-Seq that were not detected in microarrays. However, microarrays detected spaceflight-relevant genes that were missed in RNA-Seq. Discussion: Single root tip subsections can be used for transcriptome analysis using either RNA-Seq or microarrays. Both RNA-Seq and microarrays provided novel information. These data suggest that techniques for dealing with small, rare samples from spaceflight can be further enhanced, and that RNA-Seq may miss some spaceflight-relevant changes in gene expression.

  8. r

    Gene Expression Database

    • rrid.site
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Gene Expression Database [Dataset]. http://identifiers.org/RRID:SCR_006539
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Community database that collects and integrates the gene expression information in MGI with a primary emphasis on endogenous gene expression during mouse development. The data in GXD are obtained from the literature, from individual laboratories, and from large-scale data providers. All data are annotated and reviewed by GXD curators. GXD stores and integrates different types of expression data (RNA in situ hybridization; Immunohistochemistry; in situ reporter (knock in); RT-PCR; Northern and Western blots; and RNase and Nuclease s1 protection assays) and makes these data freely available in formats appropriate for comprehensive analysis. There is particular emphasis on endogenous gene expression during mouse development. GXD also maintains an index of the literature examining gene expression in the embryonic mouse. It is comprehensive and up-to-date, containing all pertinent journal articles from 1993 to the present and articles from major developmental journals from 1990 to the present. GXD stores primary data from different types of expression assays and by integrating these data, as data accumulate, GXD provides increasingly complete information about the expression profiles of transcripts and proteins in different mouse strains and mutants. GXD describes expression patterns using an extensive, hierarchically-structured dictionary of anatomical terms. In this way, expression results from assays with differing spatial resolution are recorded in a standardized and integrated manner and expression patterns can be queried at different levels of detail. The records are complemented with digitized images of the original expression data. The Anatomical Dictionary for Mouse Development has been developed by our Edinburgh colleagues, as part of the joint Mouse Gene Expression Information Resource project. GXD places the gene expression data in the larger biological context by establishing and maintaining interconnections with many other resources. Integration with MGD enables a combined analysis of genotype, sequence, expression, and phenotype data. Links to PubMed, Online Mendelian Inheritance in Man (OMIM), sequence databases, and databases from other species further enhance the utility of GXD. GXD accepts both published and unpublished data.

  9. n

    ncRNA Expression Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ncRNA Expression Database [Dataset]. http://identifiers.org/RRID:SCR_008630
    Explore at:
    Dataset updated
    Oct 15, 2024
    Description

    Database of long noncoding RNA expression that integrates annotated expression data from various sources in human and mouse. The database contains both microarray and in situ hybridization data, and supplies a rich tapestry of ancillary information for featured ncRNAs, including evolutionary conservation, secondary structure evidence, genomic context links and antisense relationships.

  10. f

    Table1_Preclinical species gene expression database: Development and...

    • datasetcatalog.nlm.nih.gov
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G. (2023). Table1_Preclinical species gene expression database: Development and meta-analysis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025224
    Explore at:
    Dataset updated
    Jan 17, 2023
    Authors
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G.
    Description

    The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.

  11. RNA-seq example data

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuhin Rana (2023). RNA-seq example data [Dataset]. https://www.kaggle.com/datasets/rana2hin/rna-seq-example-data
    Explore at:
    zip(2193914798 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Tuhin Rana
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Dataset Description

    This dataset contains RNA-seq data from human cells. The data was collected using the Illumina HiSeq 2500 platform. The data includes raw sequencing reads, gene annotations, and phenotypic data for the samples.

    Files and Folders

    Files can be downloaded using the following command:

    wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
    

    Once the file has been downloaded, it can be extracted using the following command:

    tar xvzf chrX_data.tar.gz
    

    This will create a directory called chrX_data containing the following files:

    genes/chrX.gtf
    genome/chrX.fa
    geuvadis_phenodata.csv
    indexes/
    mergelist.txt
    samples/
    

    Here are some additional details about the files in the chrX_data directory:

    • genes/chrX.gtf - This file contains gene annotations for the human X chromosome. It is in the GTF format, which is a standard format for gene annotations. The GTF file contains information about the start and end positions of genes, as well as their transcripts.
    • genome/chrX.fa - This file contains the reference genome sequence for the human X chromosome. It is in the FASTA format, which is a standard format for storing DNA sequences.
    • geuvadis_phenodata.csv - This file contains phenotypic data for the samples in the dataset. The phenotypic data includes information such as the age, sex, and disease status of the samples.
    • indexes/ - This directory contains index files for HISAT2. Index files are used to speed up the alignment of sequencing reads to a reference genome.
    • mergelist.txt - This file lists the samples to be merged. The samples in the samples/ directory can be merged using a variety of tools, such as BEDTools and STAR.
    • samples/ - This directory contains the raw sequencing data. The raw sequencing data is in the FASTQ format, which is a standard format for storing sequencing reads.

    Usage

    This dataset can be used to perform RNA-seq analysis using a variety of tools, such as HISAT2, StringTie, and Ballgown.

    Here are some examples of how this dataset can be used:

    • To identify differentially expressed genes between two groups of samples.
    • To build a gene expression atlas for a particular tissue or cell type.
    • To study the expression of genes involved in a particular disease.

    source: ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz

  12. DataSheet2_Preclinical species gene expression database: Development and...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caitlin Krause; Kinga Suwada; Eric A. G. Blomme; Kenneth Kowalkowski; Michael J. Liguori; Prathap Kumar Mahalingaiah; Scott Mittelstadt; Richard Peterson; Lauren Rendino; Andy Vo; Terry R. Van Vleet (2023). DataSheet2_Preclinical species gene expression database: Development and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fgene.2022.1078050.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Caitlin Krause; Kinga Suwada; Eric A. G. Blomme; Kenneth Kowalkowski; Michael J. Liguori; Prathap Kumar Mahalingaiah; Scott Mittelstadt; Richard Peterson; Lauren Rendino; Andy Vo; Terry R. Van Vleet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.

  13. r

    HUDSEN Human Gene Expression Spatial Database

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). HUDSEN Human Gene Expression Spatial Database [Dataset]. http://identifiers.org/RRID:SCR_006325
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of a set of standard 3D virtual models at different stages of development from Carnegie Stages (CS) 12-23 (approximately 26-56 days post conception) in which various anatomical regions have been defined with a set of anatomical terms at various stages of development (known as an ontology). Experimental data is captured and converted to digital format and then mapped to the appropriate 3D model. The ontology is used to define sites of gene expression using a set of standard descriptions and to link the expression data to an ''''anatomical tree''''. Human data from stages CS12 to CS23 can be submitted to the HUDSEN Gene Expression Database. The anatomy ontology currently being used is based on the Edinburgh Human Developmental Anatomy Database which encompasses all developing structures from CS1 to CS20 but is not detailed for developing brain structures. The ontology is being extended and refined (by Prof Luis Puelles, University of Murcia, Spain) and will be incorporated into the HUDSEN database as it is developed. Expression data is annotated using two methods to denote sites of expression in the embryo: spatial annotation and text annotation. Additionally, many aspects of the detection reagent and specimen are also annotated during this process (assignment of IDs, nucleotide sequences for probes etc). There are currently two main ways to search HUDSEN - using a gene/protein name or a named anatomical structure as the query term. The entire contents of the database can be browsed using the data browser. Results may be saved. The data in HUDSEN is generated from both from researchers within the HUDSEN project, and from the wider scientific community. The HUDSEN human gene expression spatial database is a collaboration between the Institute of Human Genetics in Newcastle, UK, and the MRC Human Genetics Unit in Edinburgh, UK, and was developed as part of the Electronic Atlas of the Developing Human Brain (EADHB) project (funded by the NIH Human Brain Project). The database is based on the Edinburgh Mouse Atlas gene expression database (EMAGE), and is designed to be an openly available resource to the research community holding gene expression patterns during early human development.

  14. Data, R code and output Seurat Objects for single cell RNA-seq analysis of...

    • figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Yunshun Chen; Gordon Smyth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.

  15. R data set: The Cancer Genome Atlas Gene Expression data

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Diener; Christian Diener (2020). R data set: The Cancer Genome Atlas Gene Expression data [Dataset]. http://doi.org/10.5281/zenodo.61982
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christian Diener; Christian Diener
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This compound data set comprises the following information from the The Cancer Genome Atlas:

    • RNA-Seq counts for 60483 genes across 11093 samples
    • HuEx 1.0 ST gene expression data for 18632 genes across 1211 samples
    • clinical indicators for 11160 patients

    All gene expression data is annotated across ENSEMBL, ENTREZ and symbols. Samples are annotated by TCGA barcodes.

    To read the data set into R (requires 6 GB of RAM) use:

    tcga <- readRDS("tcga.rds")

  16. Breast Cancer Gene Expression Dataset

    • kaggle.com
    • mubashirali.vercel.app
    zip
    Updated Jan 23, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mubashir Ali (2026). Breast Cancer Gene Expression Dataset [Dataset]. https://www.kaggle.com/datasets/mubashir1837/breast-cancer-gene-expression-dataset
    Explore at:
    zip(1911603 bytes)Available download formats
    Dataset updated
    Jan 23, 2026
    Authors
    Mubashir Ali
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Breast Cancer Gene Expression Dataset

    This dataset contains RNA-seq gene expression data from 58 breast cancer patients treated with neoadjuvant chemotherapy (NAC). The data is derived from GSE280902 on NCBI GEO.

    Files

    • cleaned_expression.csv: Gene expression matrix with 58 samples (rows) and 28,278 genes (columns). The last column is 'Response' (1 for responder, 0 for non-responder).
    • labels.csv: Sample labels with response to NAC.

    Data Description

    • Samples: 58 breast cancer patients (29 responders, 29 non-responders to NAC).
    • Genes: 28,278 protein-coding genes.
    • Response: 1 = Pathological Complete Response (pCR), 0 = No Response.

    Source

    • GEO Accession: GSE280902
    • Paper: Guevara-Nieto HM et al. Identification of predictive pretreatment biomarkers for neoadjuvant chemotherapy response in Latino invasive breast cancer patients. Mol Med 2025.
    • GitHub Repository: Breast Cancer Gene Expression Processed Data

    Usage

    This dataset can be used for machine learning models to predict NAC response in breast cancer based on gene expression profiles.

    License

    This project is licensed under the MIT License - see the LICENSE file for details.

  17. d

    Data from: Gene Expression Omnibus (GEO)

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.

  18. Comparative gene expression analysis in the Arabidopsis thaliana root apex...

    • catalog.data.gov
    • datasets.ai
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). Comparative gene expression analysis in the Arabidopsis thaliana root apex using RNA-seq and microarray transcriptome profiles [Dataset]. https://catalog.data.gov/dataset/comparative-gene-expression-analysis-in-the-arabidopsis-thaliana-root-apex-using-rna-seq-a-b73a6
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The root apex is an important section of the plant root involved in environmental sensing and cellular development. Analyzing the gene profile of root apex in diverse environments is important and challenging especially when the samples are limiting and precious such as in spaceflight. The feasibility of using tiny root sections for transcriptome analysis was examined in this study. To understand the gene expression profiles of the root apex Arabidopsis thaliana Col-0 roots were sectioned into Zone-I (0.5 mm root cap and meristematic zone) and Zone-II (1.5 mm transition elongation and growth terminating zone). Gene expression was analyzed using microarray and RNA seq. Both the techniques arrays and RNA-Seq identified 4180 common genes as differentially expressed (with > two-fold changes) between the zones. In addition 771 unique genes and 19 novel TARs were identified by RNA-Seq as differentially expressed which were not detected in the arrays. Single root tip zones can be used for full transcriptome analysis; further the root apex zones are functionally very distinct from each other. RNA-Seq provided novel information about the transcripts compared to the arrays. These data will help optimize transcriptome techniques for dealing with small rare samples.

  19. Gene Expression Data

    • kaggle.com
    zip
    Updated May 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salehin (2021). Gene Expression Data [Dataset]. https://www.kaggle.com/salehin321/gene-expression-data
    Explore at:
    zip(6624691 bytes)Available download formats
    Dataset updated
    May 3, 2021
    Authors
    Salehin
    Description

    Context

    Gene expression dataset has use in biomedical engineering and survival analysis. This dataset has been collected from The Cancer Genome Atlas portal for my master's project

    Content

    The dataset contains cell counts for each genes for each patients. There are 4571 columns (these are features) and the row represents samples or patients

    Acknowledgements

    TCGA portal

    Inspiration

    Can we predict the survival of the patients using these gene expression data?

  20. r

    RNA-sequencing data from: The AML cellular state space unveils NPM1 immune...

    • researchdata.se
    • figshare.scilifelab.se
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Lilljebjörn; Thoas Fioretos (2025). RNA-sequencing data from: The AML cellular state space unveils NPM1 immune evasion subtypes with distinct clinical outcomes, and: The complement receptor C3AR constitutes a novel therapeutic target in NPM1-mutated AML [Dataset]. http://doi.org/10.17044/SCILIFELAB.21557163
    Explore at:
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    Lund University
    Authors
    Henrik Lilljebjörn; Thoas Fioretos
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset contains bulk RNA-sequencing (RNA-seq) gene expression data from from 120 AML-samples from the subtypes NPM1 (n=33), AML-MR (n=30), TP53 (n=18), PML::RARA (n=8), CBFB::MYH11 (n=8), AML without class defining mutations (n=8), RUNX1::RUNX1T1 (n=3), KMT2A fusion genes (n=3), AML meeting the criteria for two subtypes (n=2), DEK-NUP214 (n=2), GATA2::MECOM (n=1), and bialleleic CEBPA mutation (n=1). The single cell libraries were constructed from bone marrow (n=102) or peripheral blood (n=18) using the TruSeq RNA Library Prep Kit v2 (Illumina) and sequenced on a NextSeq 500. Reads were aligned against human reference genome hg19 and read counts were determined using RSEM v1.2.30 (https://github.com/deweylab/RSEM) with gencode v19 as gene reference. Data is available as fpkm-values as determined by RSEM. Raw sequencing reads (fastq) are available at the European Genome-Phenome Archive (EGA) under accession ID EGAD50000001576: https://ega-archive.org/datasets/EGAD50000001576.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Bgee Team (2024). Bgee gene expression data [Dataset]. https://www.bgee.org

Bgee gene expression data

The Bgee Team

Related Article
Explore at:
258 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 21, 2024
Dataset authored and provided by
The Bgee Team
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question -where is a gene expressed?- and supports research in cancer and agriculture, as well as evolutionary biology.

Search
Clear search
Close search
Google apps
Main menu