Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question -where is a gene expressed?- and supports research in cancer and agriculture, as well as evolutionary biology.
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Tissue methods
This resource of the Human Protein Atlas focuses on the expression profiles in human tissues of genes both on the mRNA and protein level. The protein expression data from 45 normal human tissue types is derived from antibody-based protein profiling using conventional and multiplex immunohistochemistry. All underlying images of immunohistochemistry stained normal tissues are available together with knowledge-based annotation of protein expression levels. The protein data covers 15312 genes (76%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 51 different normal tissue types.
More information about the specific content and the generation and analysis of the data in the resource can be found on the Methods Summary. Learn about:
protein localization in tissues at a single-cell level if a gene is enriched in a particular tissue (specificity) which genes have a similar expression profile across tissues (expression cluster)
Facebook
TwitterThis data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.
Facebook
TwitterDatabase to retrieve and compare gene expression patterns between animal species. Bgee first maps heterogeneous expression data (currently bulk RNA-Seq, scRNA-Seq, Affymetrix, in situ hybridization, and EST data) to anatomy and development of different species. Bgee is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of gene expression.
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Brain methods
This resource provides comprehensive spatial profiling of the Brain, including overview of protein expression in the mammalian brain based on integration of data from human, pig and mouse. Transcriptomics data combined with affinity-based protein in situ localization down to single cell detail is available in this brain-centric sub atlas of the Human Protein Atlas. The data presented are for human genes and their one-to-one orthologues in pig and mouse. Gene summary pages provide the hierarchical expression landscape form 13 main regions of the brain to individual nuclei and subfields for every protein coding gene. For selected proteins, high content images are available to explore the cellular and subcellular protein distribution. In addition, the Brain resource contains lists of genes with elevated expression in one or a group of regions to help the user identify unique protein expression profiles linked to physiology and function.
More information about the specific content and the generation and analysis of the data in this resource can be found on the Methods Summary. Learn about:
Expression levels for all human proteins in regions and subregions of the human brain Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs Regional enriched genes with higher expression in a single or few regions of the brain Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain Differences in gene expression between mammalian species
Additional information: In addition to the data provided in the brain resource there is also data on human retina and single cell data containing information on protein expression in human neuronal and non-neuronal cell-types in the central nervous system.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
[NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.
Facebook
TwitterPremise of the study: The root apex is an important region involved in environmental sensing, but comprises a very small part of the root. Obtaining root apex transcriptomes is therefore challenging when the samples are limited. The feasibility of using tiny root sections for transcriptome analysis was examined, comparing RNA sequencing (RNA-Seq) to microarrays in characterizing genes that are relevant to spaceflight.Methods:Arabidopsis thaliana Columbia ecotype (Col-0) roots were sectioned into Zone 1 (0.5 mm; root cap and meristematic zone) and Zone 2 (1.5 mm; transition, elongation, and growth-terminating zone). Differential gene expression in each was compared.Results: Both microarrays and RNA-Seq proved applicable to the small samples. A total of 4180 genes were differentially expressed (with fold changes of 2 or greater) between Zone 1 and Zone 2. In addition, 771 unique genes and 19 novel transcriptionally active regions were identified by RNA-Seq that were not detected in microarrays. However, microarrays detected spaceflight-relevant genes that were missed in RNA-Seq. Discussion: Single root tip subsections can be used for transcriptome analysis using either RNA-Seq or microarrays. Both RNA-Seq and microarrays provided novel information. These data suggest that techniques for dealing with small, rare samples from spaceflight can be further enhanced, and that RNA-Seq may miss some spaceflight-relevant changes in gene expression.
Facebook
TwitterCommunity database that collects and integrates the gene expression information in MGI with a primary emphasis on endogenous gene expression during mouse development. The data in GXD are obtained from the literature, from individual laboratories, and from large-scale data providers. All data are annotated and reviewed by GXD curators. GXD stores and integrates different types of expression data (RNA in situ hybridization; Immunohistochemistry; in situ reporter (knock in); RT-PCR; Northern and Western blots; and RNase and Nuclease s1 protection assays) and makes these data freely available in formats appropriate for comprehensive analysis. There is particular emphasis on endogenous gene expression during mouse development. GXD also maintains an index of the literature examining gene expression in the embryonic mouse. It is comprehensive and up-to-date, containing all pertinent journal articles from 1993 to the present and articles from major developmental journals from 1990 to the present. GXD stores primary data from different types of expression assays and by integrating these data, as data accumulate, GXD provides increasingly complete information about the expression profiles of transcripts and proteins in different mouse strains and mutants. GXD describes expression patterns using an extensive, hierarchically-structured dictionary of anatomical terms. In this way, expression results from assays with differing spatial resolution are recorded in a standardized and integrated manner and expression patterns can be queried at different levels of detail. The records are complemented with digitized images of the original expression data. The Anatomical Dictionary for Mouse Development has been developed by our Edinburgh colleagues, as part of the joint Mouse Gene Expression Information Resource project. GXD places the gene expression data in the larger biological context by establishing and maintaining interconnections with many other resources. Integration with MGD enables a combined analysis of genotype, sequence, expression, and phenotype data. Links to PubMed, Online Mendelian Inheritance in Man (OMIM), sequence databases, and databases from other species further enhance the utility of GXD. GXD accepts both published and unpublished data.
Facebook
TwitterDatabase of long noncoding RNA expression that integrates annotated expression data from various sources in human and mouse. The database contains both microarray and in situ hybridization data, and supplies a rich tapestry of ancillary information for featured ncRNAs, including evolutionary conservation, secondary structure evidence, genomic context links and antisense relationships.
Facebook
TwitterThe evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Dataset Description
This dataset contains RNA-seq data from human cells. The data was collected using the Illumina HiSeq 2500 platform. The data includes raw sequencing reads, gene annotations, and phenotypic data for the samples.
Files and Folders
Files can be downloaded using the following command:
wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
Once the file has been downloaded, it can be extracted using the following command:
tar xvzf chrX_data.tar.gz
This will create a directory called chrX_data containing the following files:
genes/chrX.gtf
genome/chrX.fa
geuvadis_phenodata.csv
indexes/
mergelist.txt
samples/
Here are some additional details about the files in the chrX_data directory:
genes/chrX.gtf - This file contains gene annotations for the human X chromosome. It is in the GTF format, which is a standard format for gene annotations. The GTF file contains information about the start and end positions of genes, as well as their transcripts.genome/chrX.fa - This file contains the reference genome sequence for the human X chromosome. It is in the FASTA format, which is a standard format for storing DNA sequences.geuvadis_phenodata.csv - This file contains phenotypic data for the samples in the dataset. The phenotypic data includes information such as the age, sex, and disease status of the samples.indexes/ - This directory contains index files for HISAT2. Index files are used to speed up the alignment of sequencing reads to a reference genome.mergelist.txt - This file lists the samples to be merged. The samples in the samples/ directory can be merged using a variety of tools, such as BEDTools and STAR.samples/ - This directory contains the raw sequencing data. The raw sequencing data is in the FASTQ format, which is a standard format for storing sequencing reads.Usage
This dataset can be used to perform RNA-seq analysis using a variety of tools, such as HISAT2, StringTie, and Ballgown.
Here are some examples of how this dataset can be used:
source: ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.
Facebook
TwitterDatabase of a set of standard 3D virtual models at different stages of development from Carnegie Stages (CS) 12-23 (approximately 26-56 days post conception) in which various anatomical regions have been defined with a set of anatomical terms at various stages of development (known as an ontology). Experimental data is captured and converted to digital format and then mapped to the appropriate 3D model. The ontology is used to define sites of gene expression using a set of standard descriptions and to link the expression data to an ''''anatomical tree''''. Human data from stages CS12 to CS23 can be submitted to the HUDSEN Gene Expression Database. The anatomy ontology currently being used is based on the Edinburgh Human Developmental Anatomy Database which encompasses all developing structures from CS1 to CS20 but is not detailed for developing brain structures. The ontology is being extended and refined (by Prof Luis Puelles, University of Murcia, Spain) and will be incorporated into the HUDSEN database as it is developed. Expression data is annotated using two methods to denote sites of expression in the embryo: spatial annotation and text annotation. Additionally, many aspects of the detection reagent and specimen are also annotated during this process (assignment of IDs, nucleotide sequences for probes etc). There are currently two main ways to search HUDSEN - using a gene/protein name or a named anatomical structure as the query term. The entire contents of the database can be browsed using the data browser. Results may be saved. The data in HUDSEN is generated from both from researchers within the HUDSEN project, and from the wider scientific community. The HUDSEN human gene expression spatial database is a collaboration between the Institute of Human Genetics in Newcastle, UK, and the MRC Human Genetics Unit in Edinburgh, UK, and was developed as part of the Electronic Atlas of the Developing Human Brain (EADHB) project (funded by the NIH Human Brain Project). The database is based on the Edinburgh Mouse Atlas gene expression database (EMAGE), and is designed to be an openly available resource to the research community holding gene expression patterns during early human development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This compound data set comprises the following information from the The Cancer Genome Atlas:
All gene expression data is annotated across ENSEMBL, ENTREZ and symbols. Samples are annotated by TCGA barcodes.
To read the data set into R (requires 6 GB of RAM) use:
tcga <- readRDS("tcga.rds")
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains RNA-seq gene expression data from 58 breast cancer patients treated with neoadjuvant chemotherapy (NAC). The data is derived from GSE280902 on NCBI GEO.
cleaned_expression.csv: Gene expression matrix with 58 samples (rows) and 28,278 genes (columns). The last column is 'Response' (1 for responder, 0 for non-responder).labels.csv: Sample labels with response to NAC.This dataset can be used for machine learning models to predict NAC response in breast cancer based on gene expression profiles.
This project is licensed under the MIT License - see the LICENSE file for details.
Facebook
TwitterGene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
Facebook
TwitterThe root apex is an important section of the plant root involved in environmental sensing and cellular development. Analyzing the gene profile of root apex in diverse environments is important and challenging especially when the samples are limiting and precious such as in spaceflight. The feasibility of using tiny root sections for transcriptome analysis was examined in this study. To understand the gene expression profiles of the root apex Arabidopsis thaliana Col-0 roots were sectioned into Zone-I (0.5 mm root cap and meristematic zone) and Zone-II (1.5 mm transition elongation and growth terminating zone). Gene expression was analyzed using microarray and RNA seq. Both the techniques arrays and RNA-Seq identified 4180 common genes as differentially expressed (with > two-fold changes) between the zones. In addition 771 unique genes and 19 novel TARs were identified by RNA-Seq as differentially expressed which were not detected in the arrays. Single root tip zones can be used for full transcriptome analysis; further the root apex zones are functionally very distinct from each other. RNA-Seq provided novel information about the transcripts compared to the arrays. These data will help optimize transcriptome techniques for dealing with small rare samples.
Facebook
TwitterGene expression dataset has use in biomedical engineering and survival analysis. This dataset has been collected from The Cancer Genome Atlas portal for my master's project
The dataset contains cell counts for each genes for each patients. There are 4571 columns (these are features) and the row represents samples or patients
TCGA portal
Can we predict the survival of the patients using these gene expression data?
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains bulk RNA-sequencing (RNA-seq) gene expression data from from 120 AML-samples from the subtypes NPM1 (n=33), AML-MR (n=30), TP53 (n=18), PML::RARA (n=8), CBFB::MYH11 (n=8), AML without class defining mutations (n=8), RUNX1::RUNX1T1 (n=3), KMT2A fusion genes (n=3), AML meeting the criteria for two subtypes (n=2), DEK-NUP214 (n=2), GATA2::MECOM (n=1), and bialleleic CEBPA mutation (n=1). The single cell libraries were constructed from bone marrow (n=102) or peripheral blood (n=18) using the TruSeq RNA Library Prep Kit v2 (Illumina) and sequenced on a NextSeq 500. Reads were aligned against human reference genome hg19 and read counts were determined using RSEM v1.2.30 (https://github.com/deweylab/RSEM) with gencode v19 as gene reference. Data is available as fpkm-values as determined by RSEM. Raw sequencing reads (fastq) are available at the European Genome-Phenome Archive (EGA) under accession ID EGAD50000001576: https://ega-archive.org/datasets/EGAD50000001576.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question -where is a gene expressed?- and supports research in cancer and agriculture, as well as evolutionary biology.