100+ datasets found

GEO Accession Lists by Platform

zenodo.org

text/x-python

Updated Jan 24, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Rich Jones; Rich Jones (2020). GEO Accession Lists by Platform [Dataset]. http://doi.org/10.5281/zenodo.1297670

Explore at:

text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1297670

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Rich Jones; Rich Jones

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.

Ex:

$ head accessions/Illumina\ HiSeq\ 2000.csv

"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
"SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"

Gene Expression Omnibus (GEO) Dataset: GSE68086
kaggle.com
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samira Alipour (2024). Gene Expression Omnibus (GEO) Dataset: GSE68086 [Dataset]. https://www.kaggle.com/datasets/samiraalipour/gene-expression-omnibus-geo-dataset-gse68086/code
Explore at:
zip(7850064 bytes)Available download formats
Dataset updated
Sep 16, 2024
Authors
Samira Alipour
Description
Gene Expression Omnibus (GEO) Dataset: GSE68086

This dataset, available on the Gene Expression Omnibus (GEO) platform, provides valuable insights into cancer diagnostics through the analysis of tumor-educated platelets (TEPs). It highlights the potential of liquid biopsies for non-invasive cancer detection across multiple cancer types.

Dataset Overview:

Title: RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics.

Organism: Homo sapiens

Experiment Type: Expression profiling by high-throughput sequencing

Sample Size: 283 blood platelet samples

228 tumor-educated platelet (TEP) samples from patients with six different malignant tumors.

55 samples from healthy individuals.

Cancer Types Included: - Non-small cell lung cancer - Colorectal cancer - Pancreatic cancer - Glioblastoma - Breast cancer - Hepatobiliary carcinomas

Methodology:

Sample Collection: Blood platelets were isolated from whole blood using EDTA anti-coagulant.

RNA Extraction: Total RNA was extracted from platelet pellets using the mirVana RNA isolation kit.

Sequencing: cDNA synthesis and amplification were performed using the SMARTer Ultra Low RNA Kit, followed by Covaris shearing and Illumina HiSeq 2500 sequencing.

Quality Control: Performed using Bioanalyzer 2100 with RNA 6000 Picochip, DNA 7500, and DNA High Sensitivity chips.

Data Processing:

Quality control using Trimmomatic

Mapping to the hg19 reference genome using STAR (version 2.3.0)

Intron-spanning reads selected using Picard-tools (version 1.115)

Read summarization using HTseq (version 0.6.1)

Data Structure:

Samples: 285 columns (including controls)

Features: 57,736 Ensembl gene IDs (rows)

Data Type: Intron-spanning read counts

Files Included:

GSE68086_TEP_data_matrix.txt.gz (3.6 MB): Original gzipped text file containing intron-spanning RNA-seq read counts.

GSE68086_TEP_data_matrix.csv: Converted CSV file of the original data.

GSE68086_series_matrix.txt: Series matrix file containing detailed sample information.

GSE68086_series_matrix.csv: Converted CSV version of the series matrix file.

Potential Applications:

Non-invasive cancer diagnostics: Exploring liquid biopsies for cancer detection.

Identification of cancer-specific biomarkers.

Study of cancer-induced changes in platelet RNA profiles.

Comparative analysis across different cancer types.

Machine Learning Models for:

Binary classification: Healthy vs. cancer patients.

Multiclass classification: Distinguishing between different cancer types.

Molecular pathway analysis for identifying cancer-specific pathways.

Importance:

This dataset offers significant potential for advancing cancer diagnostics by leveraging tumor-educated platelets as biomarkers for early detection and classification of various cancer types. It represents a promising approach to non-invasive, blood-based cancer screening using gene expression profiles.

Data Access and Analysis:

GEO Accession: GSE68086

Online Analysis: Available through GEO2R

R Package: Data can be accessed and analyzed using the GEOquery package.

Citation: Best MG, Sol N, Kooi I, Tannous J, et al. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell, 2015 Nov 9;28(5):666-676. PMID: 26525104
Z
GEO gene expression dataset recompute for selected tumor samples
data.niaid.nih.gov
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
Explore at:
Dataset updated
May 13, 2024
Dataset provided by
University of Turin
Authors
Visentin, Luca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

geo_accession: The GEO sample ID of the sample.

ena_sample: The ENA sample ID of the sample.

ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

Pipeline Details

The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.
Field-wide assessment of differential HT-seq from NCBI GEO database
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.7529832
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7529832
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

- This release includes GEO series published up to Dec-31, 2020;

geo-htseq.tar.gz archive contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series.

- output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

- output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

- output/publications.csv, publication info of NCBI GEO series.

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/spots.csv, NCBI SRA sequencing run metadata.

- output/cancer.csv, cancer related experiment accessions.

- output/transcription_factor.csv, TF related experiment accessions.

- output/single-cell.csv, single cell experiment accessions.

- blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

geo-htseq-updates.tar.gz archive contains files:

- results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles

- results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series

- results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data

- results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets
d
Data from: Gene Expression Omnibus (GEO)
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
Gene expression data sources for in silico approach to assessing activation...
springernature.figshare.com
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sylvain Brohee; Amir Sonnenblick; David Venet (2023). Gene expression data sources for in silico approach to assessing activation of AKT/mTOR signalling pathway in ER-positive early Breast Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.7461776.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7461776.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sylvain Brohee; Amir Sonnenblick; David Venet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.

Dataset Data location Permanent identifier/url

NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/

UCSF GEO GSE123833

STNO2 GEO GSE4335

NCI Research Article (Supplementary files) 10.1073/pnas.1732912100

UNC4 GEO GSE18229

CAL Array Express E-TABM-158

MDA4 GEO GSE123832

KOO GEO GSE123831

HLP Array Express E-TABM-543

EXPO GEO GSE2109

VDX GEO GSE2034/GSE5327

MSK GEO GSE2603

UPP GEO GSE3494

STK GEO GSE1456

UNT GEO GSE2990

DUKE GEO GSE3143

TRANSBIG GEO GSE7390

DUKE2 GEO GSE6961

MAINZ GEO GSE11121

LUND2 GEO GSE5325

LUND GEO GSE5325

FNCLCC GEO GSE7017

EMC2 GEO GSE12276

MUG GEO GSE10510

NCCS GEO GSE5364

MCCC GEO GSE19177

EORTC10994 GEO GSE1561

DFHCC GEO GSE19615

DFHCC2 GEO GSE18864

DFHCC3 GEO GSE3744

DFHCC4 GEO GSE5460

MAQC2 GEO GSE20194

TAM GEO GSE6532/GSE9195

MDA5 GEO GSE17705

VDX3 GEO GSE12093

METABRIC EGA EGAS00000000083

TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713
d
Entrez GEO Profiles
dknet.org
scicrunch.org
+1more
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Entrez GEO Profiles [Dataset]. http://identifiers.org/RRID:SCR_004584
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004584 https://identifiers.org/RRID:SCR_004584/resolver/mentions
Dataset updated
Sep 9, 2024
Description
The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.
GDS4399
kaggle.com
zip
Updated Oct 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bassam165 (2025). GDS4399 [Dataset]. https://www.kaggle.com/datasets/bassam165/gds4399
Explore at:
zip(11496559 bytes)Available download formats
Dataset updated
Oct 26, 2025
Authors
Bassam165
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains microarray-based gene expression profiles of granulosa cells collected from women diagnosed with Polycystic Ovary Syndrome (PCOS) and from healthy controls. It originates from the NCBI GEO DataSet GDS4399, which was generated to study the molecular mechanisms underlying PCOS pathogenesis and its relationship to insulin resistance, steroidogenesis, and oocyte maturation.

The data were collected using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570 platform). Each sample corresponds to an RNA expression profile of granulosa cells isolated from ovarian aspirates of PCOS and non-PCOS women undergoing in-vitro fertilization (IVF).

Key Details

NCBI GEO Accession: GDS4399

Source: Gene Expression Omnibus (GEO), NCBI. GEO Accession: GDS4399 Title: Polycystic ovary syndrome: granulosa cells Platform: Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) Authors: Wood JR, et al. (Original study contributors) National Center for Biotechnology Information, U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

Recommended citation style (IEEE): [1] J. R. Wood et al., “Polycystic ovary syndrome: granulosa cells,” Gene Expression Omnibus (GEO), GDS4399, NCBI, Bethesda, MD, USA. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

License: This dataset is part of the public NCBI GEO database and is distributed under the Public Domain / CC0 License for research and educational use. Please cite the original GEO entry when reusing this dataset.
List of GEO accession number, published year and expression platforms of...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Limin Zhou; Wei Zheng; Majing Luo; Jing Feng; Zhichun Jin; Yan Wang; Dunlan Zhang; Qiongxiu Tang; Yan He (2023). List of GEO accession number, published year and expression platforms of microarray experiments and RNA-Seq data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0099834.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0099834.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Limin Zhou; Wei Zheng; Majing Luo; Jing Feng; Zhichun Jin; Yan Wang; Dunlan Zhang; Qiongxiu Tang; Yan He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.
H
GSE52194: Breast Cancer RNA-Seq Dataset Overview
datasetcatalog.nlm.nih.gov
search.dataone.org
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Selvaraj, Varshini (2025). GSE52194: Breast Cancer RNA-Seq Dataset Overview [Dataset]. http://doi.org/10.7910/DVN/IVTPNW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/IVTPNW
Dataset updated
Apr 21, 2025
Authors
Selvaraj, Varshini
Description
Dataset containing gene expression levels from breast cancer tissue samples of TNBC and non-TNBC patients. GSE52194, NCBI GEO accession. Normalized counts in FPKM.
Gene Expression V2
kaggle.com
zip
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira (2024). Gene Expression V2 [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/gene-expression-v2/suggestions
Explore at:
zip(18128 bytes)Available download formats
Dataset updated
Sep 25, 2024
Authors
willian oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Gene Expression Omnibus (GEO) dataset GSE68086 provides crucial insights into cancer diagnostics by analyzing tumor-educated platelets (TEPs), offering a unique approach to non-invasive cancer detection across multiple cancer types. This dataset is centered on RNA-seq analysis, which focuses on the gene expression profiles of platelets from cancer patients. Tumor-educated platelets, which are altered by the presence of tumors, represent a promising biomarker for liquid biopsies, a method that allows for cancer detection without the need for invasive tissue sampling.

The dataset titled "RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics" focuses on Homo sapiens and utilizes expression profiling by high-throughput sequencing. It includes 283 samples of blood platelets, of which 228 are tumor-educated platelets from patients with six types of malignant tumors: non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer, and hepatobiliary carcinomas. The remaining 55 samples are from healthy individuals, serving as control samples.

The methodology for generating this dataset involved collecting blood samples using EDTA as an anticoagulant, isolating platelets, and extracting RNA using the mirVana RNA isolation kit. Following RNA extraction, cDNA synthesis and amplification were performed using the SMARTer Ultra Low RNA Kit, and sequencing was conducted using the Illumina HiSeq 2500 platform. Quality control was rigorously ensured by employing the Bioanalyzer 2100 system. Data processing steps involved the use of various bioinformatics tools, including Trimmomatic for quality control, STAR for mapping reads to the hg19 reference genome, Picard-tools for selecting intron-spanning reads, and HTseq for read summarization.

The dataset's structure includes 285 columns representing samples (both TEP and healthy controls) and 57,736 rows corresponding to Ensembl gene IDs. The primary data format is intron-spanning read counts, and files available for download include both gzipped text files (such as GSE68086_TEP_data_matrix.txt.gz) and CSV files for easy access and manipulation. Detailed sample information is provided in the series matrix files, both in text and CSV formats.

This dataset has several potential applications. It can be used to explore liquid biopsy techniques for non-invasive cancer diagnostics, identify cancer-specific biomarkers, and study cancer-induced changes in platelet RNA profiles. Researchers can perform comparative analyses across different cancer types and apply machine learning models for both binary classification (distinguishing between healthy individuals and cancer patients) and multiclass classification (differentiating between various cancer types). Molecular pathway analysis could also be employed to identify pathways specific to different cancers.

The importance of this dataset lies in its potential to significantly advance cancer diagnostics by leveraging TEPs as biomarkers. This approach could enable early detection and more precise classification of cancers, offering a novel method of blood-based screening using gene expression profiles. The data can be accessed through the GEO platform under accession number GSE68086, and online analysis tools such as GEO2R and the GEOquery R package facilitate further analysis. This research was published by Best MG et al. in the Cancer Cell journal in 2015, where it was recognized for demonstrating the efficacy of tumor-educated platelets in pan-cancer diagnostics.
Supplementary Table 2_Predictive power of genes and signatures_Patient...
aacr.figshare.com
xlsx
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niccolò Roda; Andrea Cossa; Roman Hillje; Andrea Tirelli; Federica Ruscitto; Stefano Cheloni; Chiara Priami; Alberto Dalmasso; Valentina Gambino; Giada Blandano; Andrea Polazzi; Paolo Falvo; Elena Gatti; Luca Mazzarella; Lucilla Luzi; Enrica Migliaccio; Pier Giuseppe Pelicci (2023). Supplementary Table 2_Predictive power of genes and signatures_Patient cohorts GEO accession number from A Rare Subset of Primary Tumor Cells with Concomitant Hyperactivation of Extracellular Matrix Remodeling and dsRNA-IFN1 Signaling Metastasizes in Breast Cancer [Dataset]. http://doi.org/10.1158/0008-5472.23569617.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1158/0008-5472.23569617.v1
Dataset updated
Jun 23, 2023
Dataset provided by
American Association for Cancer Researchhttp://www.aacr.org/
Authors
Niccolò Roda; Andrea Cossa; Roman Hillje; Andrea Tirelli; Federica Ruscitto; Stefano Cheloni; Chiara Priami; Alberto Dalmasso; Valentina Gambino; Giada Blandano; Andrea Polazzi; Paolo Falvo; Elena Gatti; Luca Mazzarella; Lucilla Luzi; Enrica Migliaccio; Pier Giuseppe Pelicci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Table including information about the predictive power of genes and signatures identified from metastatic clones and patient cohorts GEO accession numbers
Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation...
catalog.data.gov
data.wu.ac.at
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents [Dataset]. https://catalog.data.gov/dataset/datasets-in-gene-expression-omnibus-used-in-the-study-ord-020382-evaluation-of-estrogen-re
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C. Corton, R. Ugarte, and M. Antoniou. Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents. FOOD AND CHEMICAL TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 108: 30-42, (2017).
Breast Cancer Gene Expression Dataset
kaggle.com
mubashirali.vercel.app
zip
Updated Dec 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mubashir Ali (2025). Breast Cancer Gene Expression Dataset [Dataset]. https://www.kaggle.com/datasets/mubashir1837/breast-cancer-gene-expression-dataset
Explore at:
zip(1843885 bytes)Available download formats
Dataset updated
Dec 23, 2025
Authors
Mubashir Ali
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Breast Cancer Gene Expression Dataset

This dataset contains RNA-seq gene expression data from 58 breast cancer patients treated with neoadjuvant chemotherapy (NAC). The data is derived from GSE280902 on NCBI GEO.

Files

cleaned_expression.csv: Gene expression matrix with 58 samples (rows) and 28,278 genes (columns). The last column is 'Response' (1 for responder, 0 for non-responder).

labels.csv: Sample labels with response to NAC.

Data Description

Samples: 58 breast cancer patients (29 responders, 29 non-responders to NAC).

Genes: 28,278 protein-coding genes.

Response: 1 = Pathological Complete Response (pCR), 0 = No Response.

Source

GEO Accession: GSE280902

Paper: Guevara-Nieto HM et al. Identification of predictive pretreatment biomarkers for neoadjuvant chemotherapy response in Latino invasive breast cancer patients. Mol Med 2025.

GitHub Repository: Breast Cancer Gene Expression Processed Data

Usage

This dataset can be used for machine learning models to predict NAC response in breast cancer based on gene expression profiles.

License

This project is licensed under the MIT License - see the LICENSE file for details.
NCBI accession numbers and related metadata from a study of transcriptomic...
search.datacite.org
bco-dmo.org
+1more
Updated Jul 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristen Whalen; Elizabeth Harvey (2020). NCBI accession numbers and related metadata from a study of transcriptomic response of Emiliania huxleyi to 2-heptyl-4-quinolone (HHQ) [Dataset]. http://doi.org/10.26008/1912/bco-dmo.773272.1
Explore at:
Unique identifier
https://doi.org/10.26008/1912/bco-dmo.773272.1
Dataset updated
Jul 31, 2020
Dataset provided by
DataCite
Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Authors
Kristen Whalen; Elizabeth Harvey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
NSF Division of Ocean Sciences
Description
NCBI accession numbers and related metadata from a study of transcriptomic response of Emiliania huxleyi to 2-heptyl-4-quinolone (HHQ). Sequences from this study are available at the NCBI GEO under accession series GSE131846 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE131846
Datasets in Gene Expression Omnibus used in the study ORD-022075: Chemical...
catalog.data.gov
data.amerigeoss.org
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Datasets in Gene Expression Omnibus used in the study ORD-022075: Chemical Activation of the Constitutive Activated Receptor (CAR) Leads to Activation of Oxidant-Induced Nrf2 [Dataset]. https://catalog.data.gov/dataset/datasets-in-gene-expression-omnibus-used-in-the-study-ord-022075-chemical-activation-of-th
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Gene Expression Omnibus (GEO) accession numbers of studies used in the analysis. This dataset is associated with the following publication: Rooney, J., K. Oshida, R. Kumar, W. Baldwin, and C. Corton. Chemical Activation of the Constitutive Androstane Receptor Leads to Activation of Oxidant-Induced Nrf2. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 167(1): 172-189, (2019).
DATA IMPORT GSE183947
kaggle.com
zip
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). DATA IMPORT GSE183947 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-import-gse183947
Explore at:
zip(2579505 bytes)Available download formats
Dataset updated
Nov 28, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset provides the raw data associated with the NCBI GEO accession number GSE183947. The underlying data is RNA-Sequencing (RNA-Seq) expression matrix. It is derived from matched normal and malignant breast cancer tissue samples. The primary goal of this resource is to teach the complete workflow of: - Downloading and importing high-throughput genomics data from public repositories. - Cleaning and normalizing the raw expression values (e.g., FPKM/TPM). - Preparing the data structure for downstream Differential Gene Expression (DEG) analysis. This resource is essential for anyone practicing translational bioinformatics and cancer research.
Data from: Fibroblast STAT3 activation drives organ-specific premetastatic...
zenodo.org
bin
Updated Dec 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Lasse Opsahl; Emily Lasse Opsahl; Marina Pasca di Magliano; Marina Pasca di Magliano (2025). Fibroblast STAT3 activation drives organ-specific premetastatic niche formation [Dataset]. http://doi.org/10.5281/zenodo.17102186
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17102186
Dataset updated
Dec 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emily Lasse Opsahl; Emily Lasse Opsahl; Marina Pasca di Magliano; Marina Pasca di Magliano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete objects from "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation".

Please cite: Lasse Opsahl EL, Espinoza CE, Olivei AC, Okoye JO, Watkoske H, Hoffman MT, Avritt FR, Elhossiny AM, Bischoff AC, Donahue KL, Poggi M, Kadiyala P, Arya N, Shi J, Lee KE, Zhang Y, Carpenter ES, Szczepanski JM, Frankel TL, Pasca di Magliano M. Fibroblast STAT3 Activation Drives Organ-Specific Premetastatic Niche Formation. Cancer Res. 2025 Oct 17. doi: 10.1158/0008-5472.CAN-25-3472. Epub ahead of print. PMID: 41105672.

Code used for data processing and visualization of single cell RNA sequencing data from the manuscript "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation" can be found here.

Raw data files for the novel datasets generated in this manuscript are available through the NIH Gene Expression Omnibus (GEO), accession number GSE292712.
DGE GO Enrichment Analysis Microarray Data GDS2778
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). DGE GO Enrichment Analysis Microarray Data GDS2778 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/dge-go-enrichment-analysis-microarray-data-gds2778
Explore at:
zip(6820264 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1

The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1

Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.

The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.

Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1

Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.

The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.

The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1

This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.

Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).
e
Genome-wide gene expression profiling of high-grade osteosarcoma cell lines
ebi.ac.uk
Updated Jun 5, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marieke Kuijjer; Elisabeth Peterse; Brendy van den Akker; Inge Briaire-deBruijn; Massimo Serra; Leonardo Meza-Zepeda; Ola Myklebost; Bass Hassan; Pancras Hogendoorn; Anne-Marie Cleton-Jansen (2013). Genome-wide gene expression profiling of high-grade osteosarcoma cell lines [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-42351/
Explore at:
Dataset updated
Jun 5, 2013
Authors
Marieke Kuijjer; Elisabeth Peterse; Brendy van den Akker; Inge Briaire-deBruijn; Massimo Serra; Leonardo Meza-Zepeda; Ola Myklebost; Bass Hassan; Pancras Hogendoorn; Anne-Marie Cleton-Jansen
Description
We performed genome-wide gene expression data of high-grade osteosarcoma cell lines, as well as on mesenchymal stem cells, and osteoblasts, and performed global test analysis in order to determine the most significantly affected KEGG pathways. Genome-wide gene expression analysis was performed on 19 high-grade osteosarcoma cell lines. Significantly differentially expressed genes were determined between osteosarcoma cells and two different sets of control samples - osteoblasts [n=3, GEO accession number GSE33382] and mesenchymal stem cells [n=12, GEO accession number GSE28974]. Global test was applied to the different analyses, in order to determine the most affected signaling pathways in osteosarcoma cells.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rich Jones; Rich Jones (2020). GEO Accession Lists by Platform [Dataset]. http://doi.org/10.5281/zenodo.1297670

GEO Accession Lists by Platform

Explore at:

text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1297670

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Rich Jones; Rich Jones

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.

Ex:

$ head accessions/Illumina\ HiSeq\ 2000.csv

"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
"SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"

Clear search

Close search

Google apps

Main menu

GEO Accession Lists by Platform

Gene Expression Omnibus (GEO) Dataset: GSE68086

Gene Expression Omnibus (GEO) Dataset: GSE68086

Dataset Overview:

Methodology:

Data Processing:

Data Structure:

Files Included:

Potential Applications:

Machine Learning Models for:

Importance:

Data Access and Analysis:

GEO gene expression dataset recompute for selected tumor samples

Field-wide assessment of differential HT-seq from NCBI GEO database

Data from: Gene Expression Omnibus (GEO)

Gene expression data sources for in silico approach to assessing activation...

Entrez GEO Profiles

GDS4399

List of GEO accession number, published year and expression platforms of...

GSE52194: Breast Cancer RNA-Seq Dataset Overview

Gene Expression V2

Supplementary Table 2_Predictive power of genes and signatures_Patient...

Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation...

Breast Cancer Gene Expression Dataset

Breast Cancer Gene Expression Dataset

Files

Data Description

Source

Usage

License

NCBI accession numbers and related metadata from a study of transcriptomic...

Datasets in Gene Expression Omnibus used in the study ORD-022075: Chemical...

DATA IMPORT GSE183947

Data from: Fibroblast STAT3 activation drives organ-specific premetastatic...

DGE GO Enrichment Analysis Microarray Data GDS2778

Genome-wide gene expression profiling of high-grade osteosarcoma cell lines

GEO Accession Lists by Platform