Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive pipeline for automated retrieval of gene expression data.
Supports both Microarray and RNA-Seq datasets from the NCBI GEO database.
Designed for researchers and bioinformaticians to streamline GEO data analysis.
Includes R scripts to download, process, and export GEO datasets efficiently.
Handles metadata extraction, sample annotation, and expression matrix generation.
Facilitates downstream analyses such as differential gene expression and visualization.
Compatible with various GEO platforms, reducing manual data curation efforts.
Enables reproducible research by standardizing data retrieval and processing.
Useful for comparative studies, functional genomics, and biomarker discovery.
Reduces the technical barrier for users unfamiliar with GEO data structures.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.
Dataset Data location Permanent identifier/url
NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/
UCSF GEO GSE123833
STNO2 GEO GSE4335
NCI Research Article (Supplementary files) 10.1073/pnas.1732912100
UNC4 GEO GSE18229
CAL Array Express E-TABM-158
MDA4 GEO GSE123832
KOO GEO GSE123831
HLP Array Express E-TABM-543
EXPO GEO GSE2109
VDX GEO GSE2034/GSE5327
MSK GEO GSE2603
UPP GEO GSE3494
STK GEO GSE1456
UNT GEO GSE2990
DUKE GEO GSE3143
TRANSBIG GEO GSE7390
DUKE2 GEO GSE6961
MAINZ GEO GSE11121
LUND2 GEO GSE5325
LUND GEO GSE5325
FNCLCC GEO GSE7017
EMC2 GEO GSE12276
MUG GEO GSE10510
NCCS GEO GSE5364
MCCC GEO GSE19177
EORTC10994 GEO GSE1561
DFHCC GEO GSE19615
DFHCC2 GEO GSE18864
DFHCC3 GEO GSE3744
DFHCC4 GEO GSE5460
MAQC2 GEO GSE20194
TAM GEO GSE6532/GSE9195
MDA5 GEO GSE17705
VDX3 GEO GSE12093
METABRIC EGA EGAS00000000083
TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/
DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1
The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1
Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.
The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.
Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1
Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.
The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.
The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1
This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.
Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Osteoporosis (OS) and fractures are common in patients with end-stage renal disease (ESRD) and maintenance dialysis patients. However, diagnosing osteoporosis in this population is challenging. The aim of this research is to explore the common genetic profile and potential molecular mechanisms of ESRD and OS.Methods and results: Download microarray data for ESRD and OS from the Gene Expression Omnibus (GEO) database. Weighted correlation network analysis (WGCNA) was used to identify co-expression modules associated with ESRD and OS. Random Forest (RF) and Lasso Regression were performed to identify candidate genes, and consensus clustering for hierarchical analysis. In addition, miRNAs shared in ESRD and OS were identified by differential analysis and their target genes were predicted by Tragetscan. Finally, we constructed a common miRNAs-mRNAs network with candidate genes and shared miRNAs. By WGCNA, two important modules of ESRD and one important module of OS were identified, and the functions of three major clusters were identified, including ribosome, RAS pathway, and MAPK pathway. Eight gene signatures obtained by using RF and Lasso machine learning methods with area under curve (AUC) values greater than 0.7 in ESRD and in OS confirmed their diagnostic performance. Consensus clustering successfully stratified ESRD patients, and C1 patients with more severe ESRD phenotype and OS phenotype were defined as “OS-prone group”.Conclusion: Our work identifies biological processes and underlying mechanisms shared by ESRD and OS, and identifies new candidate genes that can be used as biomarkers or potential therapeutic targets, revealing molecular alterations in susceptibility to OS in ESRD patients.
Facebook
Twitterhttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28146
"Microarray analyses of laser-captured hippocampus reveal distinct gray and white matter signatures associated with incipient Alzheimer’s disease" Blalock et al.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4117206%2F295240dbe240e64fb667b364fbee3bc7%2FGSE28146valuedistribution.png?generation=1576089220300565&alt=media" alt="">
Disease state: control, incipient, moderate, and severe patients around 80-90 years old male and female
8 controls 7 incipient 8 moderate 7 severe
"Alzheimer's disease (AD) is a devastating neurodegenerative disorder that threatens to reach epidemic proportions as our population ages. Although much research has examined molecular pathways associated with AD, relatively few studies have focused on critical early stages. Our prior microarray study correlated gene expression in human hippocampus with AD markers. Results suggested a new model of early-stage AD in which pathology spreads along myelinated axons, orchestrated by upregulated transcription and epigenetic factors related to growth and tumor suppression (Blalock et al., 2004). However, the microarray analyses were performed on RNA from fresh frozen hippocampal tissue blocks containing both gray and white matter, potentially obscuring region-specific changes. In the present study, we used laser capture microdissection to exclude major white matter tracts and selectively collect CA1 hippocampal gray matter from formalin-fixed, paraffin-embedded (FFPE) hippoc ampal sections of the same subjects assessed in our prior study. Microarray analyses of this gray matter-enriched tissue revealed many correlations similar to those seen in our prior study, particularly for neuron-related genes. Nonetheless, in the laser-captured tissue, we found a striking paucity of the AD-associated epigenetic and transcription factor genes that had been strongly overrepresented in the prior tissue block study. In addition, we identified novel pathway alterations that may have considerable mechanistic implications, including downregulation of genes stabilizing ryanodine receptor Ca2+ release and upregulation of vascular development genes. We conclude that FFPE tissue can be a reliable resource for microarray studies, that upregulation of growth-related epigenetic/ transcription factors with incipient AD is predominantly localized to white matter, further supporting our prior findings and model, and that alterations in vascular and ryanodine receptor-relat ed pathways in gray matter are closely associated with incipient AD."
Enjoy!
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive pipeline for automated retrieval of gene expression data.
Supports both Microarray and RNA-Seq datasets from the NCBI GEO database.
Designed for researchers and bioinformaticians to streamline GEO data analysis.
Includes R scripts to download, process, and export GEO datasets efficiently.
Handles metadata extraction, sample annotation, and expression matrix generation.
Facilitates downstream analyses such as differential gene expression and visualization.
Compatible with various GEO platforms, reducing manual data curation efforts.
Enables reproducible research by standardizing data retrieval and processing.
Useful for comparative studies, functional genomics, and biomarker discovery.
Reduces the technical barrier for users unfamiliar with GEO data structures.