Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.
Ex:
$ head accessions/Illumina\ HiSeq\ 2000.csv
"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
"SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
Facebook
TwitterThis dataset, available on the Gene Expression Omnibus (GEO) platform, provides valuable insights into cancer diagnostics through the analysis of tumor-educated platelets (TEPs). It highlights the potential of liquid biopsies for non-invasive cancer detection across multiple cancer types.
Cancer Types Included: - Non-small cell lung cancer - Colorectal cancer - Pancreatic cancer - Glioblastoma - Breast cancer - Hepatobiliary carcinomas
This dataset offers significant potential for advancing cancer diagnostics by leveraging tumor-educated platelets as biomarkers for early detection and classification of various cancer types. It represents a promising approach to non-invasive, blood-based cancer screening using gene expression profiles.
Citation: Best MG, Sol N, Kooi I, Tannous J, et al. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell, 2015 Nov 9;28(5):666-676. PMID: 26525104
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.
All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).
Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.
Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:
geo_accession: The GEO sample ID of the sample.
ena_sample: The ENA sample ID of the sample.
ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.
The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.
Pipeline Details
The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.
Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.
- This release includes GEO series published up to Dec-31, 2020;
geo-htseq.tar.gz archive contains following files:
- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).
- output/document_summaries.csv, document summaries of NCBI GEO series.
- output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.
- output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.
- output/publications.csv, publication info of NCBI GEO series.
- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series
- output/spots.csv, NCBI SRA sequencing run metadata.
- output/cancer.csv, cancer related experiment accessions.
- output/transcription_factor.csv, TF related experiment accessions.
- output/single-cell.csv, single cell experiment accessions.
- blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.
Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.
geo-htseq-updates.tar.gz archive contains files:
- results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles
- results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series
- results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data
- results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets
Facebook
TwitterGene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.
Dataset Data location Permanent identifier/url
NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/
UCSF GEO GSE123833
STNO2 GEO GSE4335
NCI Research Article (Supplementary files) 10.1073/pnas.1732912100
UNC4 GEO GSE18229
CAL Array Express E-TABM-158
MDA4 GEO GSE123832
KOO GEO GSE123831
HLP Array Express E-TABM-543
EXPO GEO GSE2109
VDX GEO GSE2034/GSE5327
MSK GEO GSE2603
UPP GEO GSE3494
STK GEO GSE1456
UNT GEO GSE2990
DUKE GEO GSE3143
TRANSBIG GEO GSE7390
DUKE2 GEO GSE6961
MAINZ GEO GSE11121
LUND2 GEO GSE5325
LUND GEO GSE5325
FNCLCC GEO GSE7017
EMC2 GEO GSE12276
MUG GEO GSE10510
NCCS GEO GSE5364
MCCC GEO GSE19177
EORTC10994 GEO GSE1561
DFHCC GEO GSE19615
DFHCC2 GEO GSE18864
DFHCC3 GEO GSE3744
DFHCC4 GEO GSE5460
MAQC2 GEO GSE20194
TAM GEO GSE6532/GSE9195
MDA5 GEO GSE17705
VDX3 GEO GSE12093
METABRIC EGA EGAS00000000083
TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/
DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713
Facebook
TwitterThe GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains microarray-based gene expression profiles of granulosa cells collected from women diagnosed with Polycystic Ovary Syndrome (PCOS) and from healthy controls. It originates from the NCBI GEO DataSet GDS4399, which was generated to study the molecular mechanisms underlying PCOS pathogenesis and its relationship to insulin resistance, steroidogenesis, and oocyte maturation.
The data were collected using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570 platform). Each sample corresponds to an RNA expression profile of granulosa cells isolated from ovarian aspirates of PCOS and non-PCOS women undergoing in-vitro fertilization (IVF).
Key Details
NCBI GEO Accession: GDS4399
Source: Gene Expression Omnibus (GEO), NCBI. GEO Accession: GDS4399 Title: Polycystic ovary syndrome: granulosa cells Platform: Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) Authors: Wood JR, et al. (Original study contributors) National Center for Biotechnology Information, U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399
Recommended citation style (IEEE): [1] J. R. Wood et al., “Polycystic ovary syndrome: granulosa cells,” Gene Expression Omnibus (GEO), GDS4399, NCBI, Bethesda, MD, USA. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399
License: This dataset is part of the public NCBI GEO database and is distributed under the Public Domain / CC0 License for research and educational use. Please cite the original GEO entry when reusing this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.
Facebook
TwitterDataset containing gene expression levels from breast cancer tissue samples of TNBC and non-TNBC patients. GSE52194, NCBI GEO accession. Normalized counts in FPKM.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Gene Expression Omnibus (GEO) dataset GSE68086 provides crucial insights into cancer diagnostics by analyzing tumor-educated platelets (TEPs), offering a unique approach to non-invasive cancer detection across multiple cancer types. This dataset is centered on RNA-seq analysis, which focuses on the gene expression profiles of platelets from cancer patients. Tumor-educated platelets, which are altered by the presence of tumors, represent a promising biomarker for liquid biopsies, a method that allows for cancer detection without the need for invasive tissue sampling.
The dataset titled "RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics" focuses on Homo sapiens and utilizes expression profiling by high-throughput sequencing. It includes 283 samples of blood platelets, of which 228 are tumor-educated platelets from patients with six types of malignant tumors: non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer, and hepatobiliary carcinomas. The remaining 55 samples are from healthy individuals, serving as control samples.
The methodology for generating this dataset involved collecting blood samples using EDTA as an anticoagulant, isolating platelets, and extracting RNA using the mirVana RNA isolation kit. Following RNA extraction, cDNA synthesis and amplification were performed using the SMARTer Ultra Low RNA Kit, and sequencing was conducted using the Illumina HiSeq 2500 platform. Quality control was rigorously ensured by employing the Bioanalyzer 2100 system. Data processing steps involved the use of various bioinformatics tools, including Trimmomatic for quality control, STAR for mapping reads to the hg19 reference genome, Picard-tools for selecting intron-spanning reads, and HTseq for read summarization.
The dataset's structure includes 285 columns representing samples (both TEP and healthy controls) and 57,736 rows corresponding to Ensembl gene IDs. The primary data format is intron-spanning read counts, and files available for download include both gzipped text files (such as GSE68086_TEP_data_matrix.txt.gz) and CSV files for easy access and manipulation. Detailed sample information is provided in the series matrix files, both in text and CSV formats.
This dataset has several potential applications. It can be used to explore liquid biopsy techniques for non-invasive cancer diagnostics, identify cancer-specific biomarkers, and study cancer-induced changes in platelet RNA profiles. Researchers can perform comparative analyses across different cancer types and apply machine learning models for both binary classification (distinguishing between healthy individuals and cancer patients) and multiclass classification (differentiating between various cancer types). Molecular pathway analysis could also be employed to identify pathways specific to different cancers.
The importance of this dataset lies in its potential to significantly advance cancer diagnostics by leveraging TEPs as biomarkers. This approach could enable early detection and more precise classification of cancers, offering a novel method of blood-based screening using gene expression profiles. The data can be accessed through the GEO platform under accession number GSE68086, and online analysis tools such as GEO2R and the GEOquery R package facilitate further analysis. This research was published by Best MG et al. in the Cancer Cell journal in 2015, where it was recognized for demonstrating the efficacy of tumor-educated platelets in pan-cancer diagnostics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Table including information about the predictive power of genes and signatures identified from metastatic clones and patient cohorts GEO accession numbers
Facebook
TwitterGEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C. Corton, R. Ugarte, and M. Antoniou. Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents. FOOD AND CHEMICAL TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 108: 30-42, (2017).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains RNA-seq gene expression data from 58 breast cancer patients treated with neoadjuvant chemotherapy (NAC). The data is derived from GSE280902 on NCBI GEO.
cleaned_expression.csv: Gene expression matrix with 58 samples (rows) and 28,278 genes (columns). The last column is 'Response' (1 for responder, 0 for non-responder).labels.csv: Sample labels with response to NAC.This dataset can be used for machine learning models to predict NAC response in breast cancer based on gene expression profiles.
This project is licensed under the MIT License - see the LICENSE file for details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBI accession numbers and related metadata from a study of transcriptomic response of Emiliania huxleyi to 2-heptyl-4-quinolone (HHQ). Sequences from this study are available at the NCBI GEO under accession series GSE131846 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE131846
Facebook
TwitterGene Expression Omnibus (GEO) accession numbers of studies used in the analysis. This dataset is associated with the following publication: Rooney, J., K. Oshida, R. Kumar, W. Baldwin, and C. Corton. Chemical Activation of the Constitutive Androstane Receptor Leads to Activation of Oxidant-Induced Nrf2. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 167(1): 172-189, (2019).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides the raw data associated with the NCBI GEO accession number GSE183947. The underlying data is RNA-Sequencing (RNA-Seq) expression matrix. It is derived from matched normal and malignant breast cancer tissue samples. The primary goal of this resource is to teach the complete workflow of: - Downloading and importing high-throughput genomics data from public repositories. - Cleaning and normalizing the raw expression values (e.g., FPKM/TPM). - Preparing the data structure for downstream Differential Gene Expression (DEG) analysis. This resource is essential for anyone practicing translational bioinformatics and cancer research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete objects from "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation".
Please cite: Lasse Opsahl EL, Espinoza CE, Olivei AC, Okoye JO, Watkoske H, Hoffman MT, Avritt FR, Elhossiny AM, Bischoff AC, Donahue KL, Poggi M, Kadiyala P, Arya N, Shi J, Lee KE, Zhang Y, Carpenter ES, Szczepanski JM, Frankel TL, Pasca di Magliano M. Fibroblast STAT3 Activation Drives Organ-Specific Premetastatic Niche Formation. Cancer Res. 2025 Oct 17. doi: 10.1158/0008-5472.CAN-25-3472. Epub ahead of print. PMID: 41105672.
Code used for data processing and visualization of single cell RNA sequencing data from the manuscript "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation" can be found here.
Raw data files for the novel datasets generated in this manuscript are available through the NIH Gene Expression Omnibus (GEO), accession number GSE292712.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1
The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1
Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.
The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.
Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1
Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.
The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.
The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1
This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.
Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).
Facebook
TwitterWe performed genome-wide gene expression data of high-grade osteosarcoma cell lines, as well as on mesenchymal stem cells, and osteoblasts, and performed global test analysis in order to determine the most significantly affected KEGG pathways. Genome-wide gene expression analysis was performed on 19 high-grade osteosarcoma cell lines. Significantly differentially expressed genes were determined between osteosarcoma cells and two different sets of control samples - osteoblasts [n=3, GEO accession number GSE33382] and mesenchymal stem cells [n=12, GEO accession number GSE28974]. Global test was applied to the different analyses, in order to determine the most affected signaling pathways in osteosarcoma cells.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.
Ex:
$ head accessions/Illumina\ HiSeq\ 2000.csv
"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
"SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"