The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a collection of three datasets we use to introduce the Gene Mover Distance in [1] and described below. The three datasets are exported with a basic text-based format (.csv file) like other public datasets largely used in the Machine Learning community.
The three datasets are extracted from the Gene Expression Omnibus (GEO) database [2], where they appear, respectively, with access number GSE116256 (blood leukemia, [3]), GSE84133 (human pancreas, [4]), and GSE67835 (human brain, [5]). In GEO, the datasets are decomposed into several files, which contain much more details than those reported in this version.
However, the proposed format should facilitate other researchers in using this data.
The Gene Mover's Distance is a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of GMD is to interpret the gene expression array of a single cell as a discrete probability measure. The distance between two cells is hence computed by solving an Optimal Transport problem between the two corresponding discrete measures. The Gene Mover's Distance can be used, for instance, to solve two classification problems: the classification of cells according to their condition and according to their type.
The repository contains a python script to check the basic statistics of the data.
[1] Bellazzi, R., Codegoni, A., Gualandi, S., Nicora, G., Vercesi, E. The Gene Mover's Distance: Single-cell similarity via Optimal Transport. https://arxiv.org/abs/2102.01218
[2] Gene Expression Omnibus (GEO) database, http://www.ncbi.nlm.nih.gov/geo
[3] van Galen, P., Hovestadt, V., Wadsworth II, M.H., Hughes, T.K., Griffin, G.K., Battaglia, S., Verga, J.A., Stephansky, J., Pastika, T.J., Story, J.L. and Pinkus, G.S., 2019. Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity. Cell, 176(6), pp.1265-1281.
[4] Baron, M., Veres, A., Wolock, S.L., Faust, A.L., Gaujoux, R., Vetere, A., Ryu, J.H., Wagner, B.K., Shen-Orr, S.S., Klein, A.M. and Melton, D.A., 2016. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell systems, 3(4), pp.346-360.
[5] Darmanis, S., Sloan, S.A., Zhang, Y., Enge, M., Caneda, C., Shuer, L.M., Gephart, M.G.H., Barres, B.A. and Quake, S.R., 2015. A survey of human brain transcriptome diversity at the single cell level. Proceedings of the National Academy of Sciences, 112(23), pp.7285-7290.
Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
GEO (Gene Expression Omnibus) is a public functional genomics data repository supporting MIAME-compliant data submissions. There are also tools provided to help users query and download experiments and curated gene expression profiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All the processed gene expression profiles available from GEO database and R codes for scRNA-seq analysis or BayesPrism analysis have been deposited in the figshare platform.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SMD–Stanford Microarray Database (http://genome-www5.stanford.edu)GEO–Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/)Broad Institute (http://www.broad.mit.edu/egi-bin/cancer/datasets.cgi)Oncomine (www.oncomine.org)
OT-I naïve T cells, central memory T cells, effector memory T cells and skin infiltrating T cells were sorted out from mice at different timepoints post infectionOT-I cells from 15-20 mice were pooled together for each microarray dataset By comparing the gene expression profile among different T cell subtypes, we aimed to identify core regulatory genes in skin CD8+ tissue resident memory T cell differentiation and maintenance in epithelia tissue
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network topological parameters from gene expression data from GEO dataset for adult and paediatric patient.
To characterize gene expression of mouse primary oligodendrocytes in the diffrentiation into oligodendendrocytes, we generated RNA-seq. data in a time-course manner. We then used RNA-seq data to assess gene expression profiling of interested genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The testis has been identified as the organ in which a large number of tissue-enriched genes are present. However, a large portion of transcripts related to each stage or cell type in the testis still remains unknown. In this study, databases combined with confirmatory measurements were used to investigate testis-enriched genes, localization in the testis, developmental regulation, gene expression profiles of testicular disease, and signaling pathways. Our comparative analysis of GEO DataSets showed that 24 genes are predominantly expressed in testis. Cellular locations of 15 testis-enriched proteins in human testis have been identified and most of them were located in spermatocytes and round spermatids. Real-time PCR revealed that expressions of these 15 genes are significantly increased during testis development. Also, an analysis of GEO DataSets indicated that expressions of these 15 genes were significantly decreased in teratozoospermic patients and polyubiquitin knockout mice, suggesting their involvement in normal testis development. Pathway analysis revealed that most of those 15 genes are implicated in various sperm-related cell processes and disease conditions. This approach provides effective strategies for discovering novel testis-enriched genes and their expression patterns, paving the way for future characterization of their functions regarding infertility and providing new biomarkers for specific stages of spematogenesis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual total students amount from 2017 to 2023 for Geo International High School
mRNA expression profiles for cell lines or tissues following genetic perturbation (knockdown, knockout, over-expression, mutation)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual distribution of students across grade levels in Geo International High School
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Prostate cancer (PCa)is a malignancy of the urinary system with a high incidence, which is the second most common male cancer in the world. There are still huge challenges in the treatment of prostate cancer. It is urgent to screen out potential key biomarkers for the pathogenesis and prognosis of PCa.Methods: Multiple gene differential expression profile datasets of PCa tissues and normal prostate tissues were integrated analysis by R software. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the overlapping Differentially Expressed Genes (DEG) were performed. The STRING online database was used in conjunction with Cytospace software for protein-protein interaction (PPI) network analysis to define hub genes. The relative mRNA expression of hub genes was detected in Gene Expression Profiling Interactive Analysis (GEPIA) database. A prognostic gene signature was identified by Univariate and multivariate Cox regression analysis.Results: Three hundred twelve up-regulated genes and 85 down-regulated genes were identified from three gene expression profiles (GSE69223, GSE3325, GSE55945) and The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) dataset. Seven hub genes (FGF2, FLNA, FLNC, VCL, CAV1, ACTC1, and MYLK) further were detected, which related to the pathogenesis of PCa. Seven prognostic genes (BCO1, BAIAP2L2, C7, AP000844.2, ASB9, MKI67P1, and TMEM272) were screened to construct a prognostic gene signature, which shows good predictive power for survival by the ROC curve analysis.Conclusions: We identified a robust set of new potential key genes in PCa, which would provide reliable biomarkers for early diagnosis and prognosis and would promote molecular targeting therapy for PCa.
Number of cases for each cancer type and GEO series used for gene expression profiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual distribution of students across grade levels in Geo Next Generation Academy
CD8 T cells play a crucial role in controlling HIV infection. We employed single-cell RNA sequencing (scRNAseq) to analyze HIV-1 specific CD8 T cells after years of treated infection. Additionally, HIV-2 specific CD8 T cells were studied to serve as a control for an effective anti-HIV response. HIV-specific CD8 T cells from all patients at each time point were index-sorted by FACS into 96-well microtiter plates for scRNAseq analysis.
LAPC4 cells were starved for 2 days and stimulated with 1µM 5α-Abi or 0.1nM DHT. Gene expression profiles are detected to determine the effect of 5a-Abi on prostate cancer cell line. LAPC4 cells were starved for 2 days with phenol red-free and serum free-medium and stimulated with 1µM 5α-Abi or 0.1nM DHT for 48h. Gene expression profiles are detected to determine the effect of 5a-Abi on prostate cancer cell line.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual total students amount from 2021 to 2023 for Geo Next Generation Academy
GEO-ZM02 - BioCentury Product Profiles for the biopharma industry
The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.