100+ datasets found
  1. GEO Accession Lists by Platform

    • zenodo.org
    text/x-python
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rich Jones; Rich Jones (2020). GEO Accession Lists by Platform [Dataset]. http://doi.org/10.5281/zenodo.1297670
    Explore at:
    text/x-pythonAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rich Jones; Rich Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.

    Ex:

    $ head accessions/Illumina\ HiSeq\ 2000.csv
    "Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
    "SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
    "SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
    "SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
    "SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"

  2. Gene Expression Omnibus (GEO) Dataset: GSE68086

    • kaggle.com
    zip
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samira Alipour (2024). Gene Expression Omnibus (GEO) Dataset: GSE68086 [Dataset]. https://www.kaggle.com/datasets/samiraalipour/gene-expression-omnibus-geo-dataset-gse68086/code
    Explore at:
    zip(7850064 bytes)Available download formats
    Dataset updated
    Sep 16, 2024
    Authors
    Samira Alipour
    Description

    Gene Expression Omnibus (GEO) Dataset: GSE68086

    This dataset, available on the Gene Expression Omnibus (GEO) platform, provides valuable insights into cancer diagnostics through the analysis of tumor-educated platelets (TEPs). It highlights the potential of liquid biopsies for non-invasive cancer detection across multiple cancer types.

    Dataset Overview:

    • Title: RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics.
    • Organism: Homo sapiens
    • Experiment Type: Expression profiling by high-throughput sequencing
    • Sample Size: 283 blood platelet samples
      • 228 tumor-educated platelet (TEP) samples from patients with six different malignant tumors.
      • 55 samples from healthy individuals.

    Cancer Types Included: - Non-small cell lung cancer - Colorectal cancer - Pancreatic cancer - Glioblastoma - Breast cancer - Hepatobiliary carcinomas

    Methodology:

    • Sample Collection: Blood platelets were isolated from whole blood using EDTA anti-coagulant.
    • RNA Extraction: Total RNA was extracted from platelet pellets using the mirVana RNA isolation kit.
    • Sequencing: cDNA synthesis and amplification were performed using the SMARTer Ultra Low RNA Kit, followed by Covaris shearing and Illumina HiSeq 2500 sequencing.
    • Quality Control: Performed using Bioanalyzer 2100 with RNA 6000 Picochip, DNA 7500, and DNA High Sensitivity chips.

    Data Processing:

    • Quality control using Trimmomatic
    • Mapping to the hg19 reference genome using STAR (version 2.3.0)
    • Intron-spanning reads selected using Picard-tools (version 1.115)
    • Read summarization using HTseq (version 0.6.1)

    Data Structure:

    • Samples: 285 columns (including controls)
    • Features: 57,736 Ensembl gene IDs (rows)
    • Data Type: Intron-spanning read counts

    Files Included:

    1. GSE68086_TEP_data_matrix.txt.gz (3.6 MB): Original gzipped text file containing intron-spanning RNA-seq read counts.
    2. GSE68086_TEP_data_matrix.csv: Converted CSV file of the original data.
    3. GSE68086_series_matrix.txt: Series matrix file containing detailed sample information.
    4. GSE68086_series_matrix.csv: Converted CSV version of the series matrix file.

    Potential Applications:

    • Non-invasive cancer diagnostics: Exploring liquid biopsies for cancer detection.
    • Identification of cancer-specific biomarkers.
    • Study of cancer-induced changes in platelet RNA profiles.
    • Comparative analysis across different cancer types.

    Machine Learning Models for:

    • Binary classification: Healthy vs. cancer patients.
    • Multiclass classification: Distinguishing between different cancer types.
    • Molecular pathway analysis for identifying cancer-specific pathways.

    Importance:

    This dataset offers significant potential for advancing cancer diagnostics by leveraging tumor-educated platelets as biomarkers for early detection and classification of various cancer types. It represents a promising approach to non-invasive, blood-based cancer screening using gene expression profiles.

    Data Access and Analysis:

    • GEO Accession: GSE68086
    • Online Analysis: Available through GEO2R
    • R Package: Data can be accessed and analyzed using the GEOquery package.

    Citation: Best MG, Sol N, Kooi I, Tannous J, et al. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell, 2015 Nov 9;28(5):666-676. PMID: 26525104

  3. Z

    GEO gene expression dataset recompute for selected tumor samples

    • data.niaid.nih.gov
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset provided by
    University of Turin
    Authors
    Visentin, Luca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

    All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

    Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

    Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

    geo_accession: The GEO sample ID of the sample.

    ena_sample: The ENA sample ID of the sample.

    ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

    The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

    Pipeline Details

    The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

    Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

  4. Field-wide assessment of differential HT-seq from NCBI GEO database

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.7529832
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

    - This release includes GEO series published up to Dec-31, 2020;

    geo-htseq.tar.gz archive contains following files:

    - output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

    - output/document_summaries.csv, document summaries of NCBI GEO series.

    - output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

    - output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

    - output/publications.csv, publication info of NCBI GEO series.

    - output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

    - output/spots.csv, NCBI SRA sequencing run metadata.

    - output/cancer.csv, cancer related experiment accessions.

    - output/transcription_factor.csv, TF related experiment accessions.

    - output/single-cell.csv, single cell experiment accessions.

    - blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

    Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

    geo-htseq-updates.tar.gz archive contains files:

    - results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles

    - results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series

    - results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data

    - results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets

  5. d

    Data from: Gene Expression Omnibus (GEO)

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.

  6. Gene expression data sources for in silico approach to assessing activation...

    • springernature.figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvain Brohee; Amir Sonnenblick; David Venet (2023). Gene expression data sources for in silico approach to assessing activation of AKT/mTOR signalling pathway in ER-positive early Breast Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.7461776.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sylvain Brohee; Amir Sonnenblick; David Venet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.

    Dataset Data location Permanent identifier/url

    NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/

    UCSF GEO GSE123833

    STNO2 GEO GSE4335

    NCI Research Article (Supplementary files) 10.1073/pnas.1732912100

    UNC4 GEO GSE18229

    CAL Array Express E-TABM-158

    MDA4 GEO GSE123832

    KOO GEO GSE123831

    HLP Array Express E-TABM-543

    EXPO GEO GSE2109

    VDX GEO GSE2034/GSE5327

    MSK GEO GSE2603

    UPP GEO GSE3494

    STK GEO GSE1456

    UNT GEO GSE2990

    DUKE GEO GSE3143

    TRANSBIG GEO GSE7390

    DUKE2 GEO GSE6961

    MAINZ GEO GSE11121

    LUND2 GEO GSE5325

    LUND GEO GSE5325

    FNCLCC GEO GSE7017

    EMC2 GEO GSE12276

    MUG GEO GSE10510

    NCCS GEO GSE5364

    MCCC GEO GSE19177

    EORTC10994 GEO GSE1561

    DFHCC GEO GSE19615

    DFHCC2 GEO GSE18864

    DFHCC3 GEO GSE3744

    DFHCC4 GEO GSE5460

    MAQC2 GEO GSE20194

    TAM GEO GSE6532/GSE9195

    MDA5 GEO GSE17705

    VDX3 GEO GSE12093

    METABRIC EGA EGAS00000000083

    TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

    DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713

  7. d

    Entrez GEO Profiles

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Entrez GEO Profiles [Dataset]. http://identifiers.org/RRID:SCR_004584
    Explore at:
    Dataset updated
    Sep 9, 2024
    Description

    The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.

  8. GDS4399

    • kaggle.com
    zip
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bassam165 (2025). GDS4399 [Dataset]. https://www.kaggle.com/datasets/bassam165/gds4399
    Explore at:
    zip(11496559 bytes)Available download formats
    Dataset updated
    Oct 26, 2025
    Authors
    Bassam165
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains microarray-based gene expression profiles of granulosa cells collected from women diagnosed with Polycystic Ovary Syndrome (PCOS) and from healthy controls. It originates from the NCBI GEO DataSet GDS4399, which was generated to study the molecular mechanisms underlying PCOS pathogenesis and its relationship to insulin resistance, steroidogenesis, and oocyte maturation.

    The data were collected using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570 platform). Each sample corresponds to an RNA expression profile of granulosa cells isolated from ovarian aspirates of PCOS and non-PCOS women undergoing in-vitro fertilization (IVF).

    Key Details

    NCBI GEO Accession: GDS4399

    Source: Gene Expression Omnibus (GEO), NCBI. GEO Accession: GDS4399 Title: Polycystic ovary syndrome: granulosa cells Platform: Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) Authors: Wood JR, et al. (Original study contributors) National Center for Biotechnology Information, U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

    Recommended citation style (IEEE): [1] J. R. Wood et al., “Polycystic ovary syndrome: granulosa cells,” Gene Expression Omnibus (GEO), GDS4399, NCBI, Bethesda, MD, USA. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

    License: This dataset is part of the public NCBI GEO database and is distributed under the Public Domain / CC0 License for research and educational use. Please cite the original GEO entry when reusing this dataset.

  9. List of GEO accession number, published year and expression platforms of...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Limin Zhou; Wei Zheng; Majing Luo; Jing Feng; Zhichun Jin; Yan Wang; Dunlan Zhang; Qiongxiu Tang; Yan He (2023). List of GEO accession number, published year and expression platforms of microarray experiments and RNA-Seq data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0099834.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Limin Zhou; Wei Zheng; Majing Luo; Jing Feng; Zhichun Jin; Yan Wang; Dunlan Zhang; Qiongxiu Tang; Yan He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.

  10. H

    GSE52194: Breast Cancer RNA-Seq Dataset Overview

    • datasetcatalog.nlm.nih.gov
    • search.dataone.org
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Selvaraj, Varshini (2025). GSE52194: Breast Cancer RNA-Seq Dataset Overview [Dataset]. http://doi.org/10.7910/DVN/IVTPNW
    Explore at:
    Dataset updated
    Apr 21, 2025
    Authors
    Selvaraj, Varshini
    Description

    Dataset containing gene expression levels from breast cancer tissue samples of TNBC and non-TNBC patients. GSE52194, NCBI GEO accession. Normalized counts in FPKM.

  11. Gene Expression V2

    • kaggle.com
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2024). Gene Expression V2 [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/gene-expression-v2/suggestions
    Explore at:
    zip(18128 bytes)Available download formats
    Dataset updated
    Sep 25, 2024
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Gene Expression Omnibus (GEO) dataset GSE68086 provides crucial insights into cancer diagnostics by analyzing tumor-educated platelets (TEPs), offering a unique approach to non-invasive cancer detection across multiple cancer types. This dataset is centered on RNA-seq analysis, which focuses on the gene expression profiles of platelets from cancer patients. Tumor-educated platelets, which are altered by the presence of tumors, represent a promising biomarker for liquid biopsies, a method that allows for cancer detection without the need for invasive tissue sampling.

    The dataset titled "RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics" focuses on Homo sapiens and utilizes expression profiling by high-throughput sequencing. It includes 283 samples of blood platelets, of which 228 are tumor-educated platelets from patients with six types of malignant tumors: non-small cell lung cancer, colorectal cancer, pancreatic cancer, glioblastoma, breast cancer, and hepatobiliary carcinomas. The remaining 55 samples are from healthy individuals, serving as control samples.

    The methodology for generating this dataset involved collecting blood samples using EDTA as an anticoagulant, isolating platelets, and extracting RNA using the mirVana RNA isolation kit. Following RNA extraction, cDNA synthesis and amplification were performed using the SMARTer Ultra Low RNA Kit, and sequencing was conducted using the Illumina HiSeq 2500 platform. Quality control was rigorously ensured by employing the Bioanalyzer 2100 system. Data processing steps involved the use of various bioinformatics tools, including Trimmomatic for quality control, STAR for mapping reads to the hg19 reference genome, Picard-tools for selecting intron-spanning reads, and HTseq for read summarization.

    The dataset's structure includes 285 columns representing samples (both TEP and healthy controls) and 57,736 rows corresponding to Ensembl gene IDs. The primary data format is intron-spanning read counts, and files available for download include both gzipped text files (such as GSE68086_TEP_data_matrix.txt.gz) and CSV files for easy access and manipulation. Detailed sample information is provided in the series matrix files, both in text and CSV formats.

    This dataset has several potential applications. It can be used to explore liquid biopsy techniques for non-invasive cancer diagnostics, identify cancer-specific biomarkers, and study cancer-induced changes in platelet RNA profiles. Researchers can perform comparative analyses across different cancer types and apply machine learning models for both binary classification (distinguishing between healthy individuals and cancer patients) and multiclass classification (differentiating between various cancer types). Molecular pathway analysis could also be employed to identify pathways specific to different cancers.

    The importance of this dataset lies in its potential to significantly advance cancer diagnostics by leveraging TEPs as biomarkers. This approach could enable early detection and more precise classification of cancers, offering a novel method of blood-based screening using gene expression profiles. The data can be accessed through the GEO platform under accession number GSE68086, and online analysis tools such as GEO2R and the GEOquery R package facilitate further analysis. This research was published by Best MG et al. in the Cancer Cell journal in 2015, where it was recognized for demonstrating the efficacy of tumor-educated platelets in pan-cancer diagnostics.

  12. Supplementary Table 2_Predictive power of genes and signatures_Patient...

    • aacr.figshare.com
    xlsx
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niccolò Roda; Andrea Cossa; Roman Hillje; Andrea Tirelli; Federica Ruscitto; Stefano Cheloni; Chiara Priami; Alberto Dalmasso; Valentina Gambino; Giada Blandano; Andrea Polazzi; Paolo Falvo; Elena Gatti; Luca Mazzarella; Lucilla Luzi; Enrica Migliaccio; Pier Giuseppe Pelicci (2023). Supplementary Table 2_Predictive power of genes and signatures_Patient cohorts GEO accession number from A Rare Subset of Primary Tumor Cells with Concomitant Hyperactivation of Extracellular Matrix Remodeling and dsRNA-IFN1 Signaling Metastasizes in Breast Cancer [Dataset]. http://doi.org/10.1158/0008-5472.23569617.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    American Association for Cancer Researchhttp://www.aacr.org/
    Authors
    Niccolò Roda; Andrea Cossa; Roman Hillje; Andrea Tirelli; Federica Ruscitto; Stefano Cheloni; Chiara Priami; Alberto Dalmasso; Valentina Gambino; Giada Blandano; Andrea Polazzi; Paolo Falvo; Elena Gatti; Luca Mazzarella; Lucilla Luzi; Enrica Migliaccio; Pier Giuseppe Pelicci
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Table including information about the predictive power of genes and signatures identified from metastatic clones and patient cohorts GEO accession numbers

  13. Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation...

    • catalog.data.gov
    • data.wu.ac.at
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents [Dataset]. https://catalog.data.gov/dataset/datasets-in-gene-expression-omnibus-used-in-the-study-ord-020382-evaluation-of-estrogen-re
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C. Corton, R. Ugarte, and M. Antoniou. Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents. FOOD AND CHEMICAL TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 108: 30-42, (2017).

  14. Breast Cancer Gene Expression Dataset

    • kaggle.com
    • mubashirali.vercel.app
    zip
    Updated Dec 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mubashir Ali (2025). Breast Cancer Gene Expression Dataset [Dataset]. https://www.kaggle.com/datasets/mubashir1837/breast-cancer-gene-expression-dataset
    Explore at:
    zip(1843885 bytes)Available download formats
    Dataset updated
    Dec 23, 2025
    Authors
    Mubashir Ali
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Breast Cancer Gene Expression Dataset

    This dataset contains RNA-seq gene expression data from 58 breast cancer patients treated with neoadjuvant chemotherapy (NAC). The data is derived from GSE280902 on NCBI GEO.

    Files

    • cleaned_expression.csv: Gene expression matrix with 58 samples (rows) and 28,278 genes (columns). The last column is 'Response' (1 for responder, 0 for non-responder).
    • labels.csv: Sample labels with response to NAC.

    Data Description

    • Samples: 58 breast cancer patients (29 responders, 29 non-responders to NAC).
    • Genes: 28,278 protein-coding genes.
    • Response: 1 = Pathological Complete Response (pCR), 0 = No Response.

    Source

    • GEO Accession: GSE280902
    • Paper: Guevara-Nieto HM et al. Identification of predictive pretreatment biomarkers for neoadjuvant chemotherapy response in Latino invasive breast cancer patients. Mol Med 2025.
    • GitHub Repository: Breast Cancer Gene Expression Processed Data

    Usage

    This dataset can be used for machine learning models to predict NAC response in breast cancer based on gene expression profiles.

    License

    This project is licensed under the MIT License - see the LICENSE file for details.

  15. NCBI accession numbers and related metadata from a study of transcriptomic...

    • search.datacite.org
    • bco-dmo.org
    • +1more
    Updated Jul 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristen Whalen; Elizabeth Harvey (2020). NCBI accession numbers and related metadata from a study of transcriptomic response of Emiliania huxleyi to 2-heptyl-4-quinolone (HHQ) [Dataset]. http://doi.org/10.26008/1912/bco-dmo.773272.1
    Explore at:
    Dataset updated
    Jul 31, 2020
    Dataset provided by
    DataCite
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Kristen Whalen; Elizabeth Harvey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    NSF Division of Ocean Sciences
    Description

    NCBI accession numbers and related metadata from a study of transcriptomic response of Emiliania huxleyi to 2-heptyl-4-quinolone (HHQ). Sequences from this study are available at the NCBI GEO under accession series GSE131846 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE131846

  16. Datasets in Gene Expression Omnibus used in the study ORD-022075: Chemical...

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Datasets in Gene Expression Omnibus used in the study ORD-022075: Chemical Activation of the Constitutive Activated Receptor (CAR) Leads to Activation of Oxidant-Induced Nrf2 [Dataset]. https://catalog.data.gov/dataset/datasets-in-gene-expression-omnibus-used-in-the-study-ord-022075-chemical-activation-of-th
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Gene Expression Omnibus (GEO) accession numbers of studies used in the analysis. This dataset is associated with the following publication: Rooney, J., K. Oshida, R. Kumar, W. Baldwin, and C. Corton. Chemical Activation of the Constitutive Androstane Receptor Leads to Activation of Oxidant-Induced Nrf2. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 167(1): 172-189, (2019).

  17. DATA IMPORT GSE183947

    • kaggle.com
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DATA IMPORT GSE183947 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-import-gse183947
    Explore at:
    zip(2579505 bytes)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides the raw data associated with the NCBI GEO accession number GSE183947. The underlying data is RNA-Sequencing (RNA-Seq) expression matrix. It is derived from matched normal and malignant breast cancer tissue samples. The primary goal of this resource is to teach the complete workflow of: - Downloading and importing high-throughput genomics data from public repositories. - Cleaning and normalizing the raw expression values (e.g., FPKM/TPM). - Preparing the data structure for downstream Differential Gene Expression (DEG) analysis. This resource is essential for anyone practicing translational bioinformatics and cancer research.

  18. Data from: Fibroblast STAT3 activation drives organ-specific premetastatic...

    • zenodo.org
    bin
    Updated Dec 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Lasse Opsahl; Emily Lasse Opsahl; Marina Pasca di Magliano; Marina Pasca di Magliano (2025). Fibroblast STAT3 activation drives organ-specific premetastatic niche formation [Dataset]. http://doi.org/10.5281/zenodo.17102186
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emily Lasse Opsahl; Emily Lasse Opsahl; Marina Pasca di Magliano; Marina Pasca di Magliano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete objects from "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation".

    Please cite: Lasse Opsahl EL, Espinoza CE, Olivei AC, Okoye JO, Watkoske H, Hoffman MT, Avritt FR, Elhossiny AM, Bischoff AC, Donahue KL, Poggi M, Kadiyala P, Arya N, Shi J, Lee KE, Zhang Y, Carpenter ES, Szczepanski JM, Frankel TL, Pasca di Magliano M. Fibroblast STAT3 Activation Drives Organ-Specific Premetastatic Niche Formation. Cancer Res. 2025 Oct 17. doi: 10.1158/0008-5472.CAN-25-3472. Epub ahead of print. PMID: 41105672.

    Code used for data processing and visualization of single cell RNA sequencing data from the manuscript "Fibroblast STAT3 activation drives organ-specific premetastatic niche formation" can be found here.

    Raw data files for the novel datasets generated in this manuscript are available through the NIH Gene Expression Omnibus (GEO), accession number GSE292712.

  19. DGE GO Enrichment Analysis Microarray Data GDS2778

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DGE GO Enrichment Analysis Microarray Data GDS2778 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/dge-go-enrichment-analysis-microarray-data-gds2778
    Explore at:
    zip(6820264 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1

    The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1

    Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.

    The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.

    Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1

    Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.

    The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.

    The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1

    This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.

    Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).

  20. e

    Genome-wide gene expression profiling of high-grade osteosarcoma cell lines

    • ebi.ac.uk
    Updated Jun 5, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marieke Kuijjer; Elisabeth Peterse; Brendy van den Akker; Inge Briaire-deBruijn; Massimo Serra; Leonardo Meza-Zepeda; Ola Myklebost; Bass Hassan; Pancras Hogendoorn; Anne-Marie Cleton-Jansen (2013). Genome-wide gene expression profiling of high-grade osteosarcoma cell lines [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-42351/
    Explore at:
    Dataset updated
    Jun 5, 2013
    Authors
    Marieke Kuijjer; Elisabeth Peterse; Brendy van den Akker; Inge Briaire-deBruijn; Massimo Serra; Leonardo Meza-Zepeda; Ola Myklebost; Bass Hassan; Pancras Hogendoorn; Anne-Marie Cleton-Jansen
    Description

    We performed genome-wide gene expression data of high-grade osteosarcoma cell lines, as well as on mesenchymal stem cells, and osteoblasts, and performed global test analysis in order to determine the most significantly affected KEGG pathways. Genome-wide gene expression analysis was performed on 19 high-grade osteosarcoma cell lines. Significantly differentially expressed genes were determined between osteosarcoma cells and two different sets of control samples - osteoblasts [n=3, GEO accession number GSE33382] and mesenchymal stem cells [n=12, GEO accession number GSE28974]. Global test was applied to the different analyses, in order to determine the most affected signaling pathways in osteosarcoma cells.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rich Jones; Rich Jones (2020). GEO Accession Lists by Platform [Dataset]. http://doi.org/10.5281/zenodo.1297670
Organization logo

GEO Accession Lists by Platform

Explore at:
text/x-pythonAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rich Jones; Rich Jones
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Refine.bio survey list generator required CSV, tediously exported manually from GEO web interface.

Ex:

$ head accessions/Illumina\ HiSeq\ 2000.csv
"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"
"SRX4195895","4","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406604","","370.5","1","15916120","795806000","4","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195894","3","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406603","","362.43","1","16021366","801068300","3","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195893","6","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406602","","407.58","1","18432342","921617100","6","miRNA-Seq","TRANSCRIPTOMIC","unspecified"
"SRX4195892","5","Homo sapiens","Illumina HiSeq 2000","Kolling Institute, The University of Sydney","SRP150290","RET-altered microRNAs in MTC","SRS3406605","","347.33","1","16162471","808123550","5","miRNA-Seq","TRANSCRIPTOMIC","unspecified"

Search
Clear search
Close search
Google apps
Main menu