100+ datasets found
  1. f

    Gene expression data from Gene Expression Omnibus (GEO) database.

    • datasetcatalog.nlm.nih.gov
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dai, Minchen; Xie, Ningning; Fu, Leyi; Zhang, Songying; Jiang, Zhou; Wang, Fangfang; Zhou, Jue; Qu, Fan (2023). Gene expression data from Gene Expression Omnibus (GEO) database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001040576
    Explore at:
    Dataset updated
    Mar 1, 2023
    Authors
    Dai, Minchen; Xie, Ningning; Fu, Leyi; Zhang, Songying; Jiang, Zhou; Wang, Fangfang; Zhou, Jue; Qu, Fan
    Description

    Gene expression data from Gene Expression Omnibus (GEO) database.

  2. d

    Data from: Gene Expression Omnibus (GEO)

    • catalog.data.gov
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.

  3. GEO (Gene Expression Omnibus)

    • healthdata.gov
    csv, xlsx, xml
    Updated Jul 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2021). GEO (Gene Expression Omnibus) [Dataset]. https://healthdata.gov/NIH/GEO-Gene-Expression-Omnibus-/ypwa-g5v3
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jul 2, 2021
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    GEO (Gene Expression Omnibus) is a public functional genomics data repository supporting MIAME-compliant data submissions. There are also tools provided to help users query and download experiments and curated gene expression profiles.

  4. Z

    Field-wide assessment of differential HT-seq from NCBI GEO database

    • data.niaid.nih.gov
    Updated Jan 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Päll, Taavi; Luidalepp, Hannes; Tenson, Tanel; Maiväli, Ülo (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3747112
    Explore at:
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    University of Tartu
    Authors
    Päll, Taavi; Luidalepp, Hannes; Tenson, Tanel; Maiväli, Ülo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

    • This release includes GEO series published up to Dec-31, 2020;

    geo-htseq.tar.gz archive contains following files:

    • output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

    • output/document_summaries.csv, document summaries of NCBI GEO series.

    • output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

    • output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

    • output/publications.csv, publication info of NCBI GEO series.

    • output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

    • output/spots.csv, NCBI SRA sequencing run metadata.

    • output/cancer.csv, cancer related experiment accessions.

    • output/transcription_factor.csv, TF related experiment accessions.

    • output/single-cell.csv, single cell experiment accessions.

    • blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

    Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

    geo-htseq-updates.tar.gz archive contains files:

    • results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles

    • results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series

    • results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data

    • results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets

  5. f

    Details of GEO datasets used in the study.

    • datasetcatalog.nlm.nih.gov
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lai, Xiaodong; Yang, Yan; Wang, Meng; Yan, Yan; Zhang, Chong; Zhang, Haini; Chen, Wanxin; Wang, Baoxi (2025). Details of GEO datasets used in the study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002063634
    Explore at:
    Dataset updated
    Jun 2, 2025
    Authors
    Lai, Xiaodong; Yang, Yan; Wang, Meng; Yan, Yan; Zhang, Chong; Zhang, Haini; Chen, Wanxin; Wang, Baoxi
    Description

    Hidradenitis suppurativa (HS) is a chronic inflammatory skin disorder, affecting the pilosebaceous unit in apocrine gland-rich areas, characterized by painful nodules, abscesses and draining tunnels. The underlying molecular and immunological mechanisms remain poorly understood. This study aimed to identify key gene expression patterns, hub genes, and analyze the potential role of the CCL19/CCL21-CCR7 axis in HS lesions and peripheral blood using bulk and single-cell RNA sequencing analyses. By employing an integrative approach that included three machine learning methods and subsequent validation on an independent dataset, we successfully identified AKR1B10, IGFL2, WNK2, SLAMF7, and CCR7 as potential hub genes and therapeutic targets for HS treatment. Furthermore, our study found that CCL19 and CCL21 may originate from various cells such as fibroblasts and dendritic cells, playing a crucial role in recruiting CCR7-associated immune cells, particularly Treg cells. The involvement of the CCL19/CCL21-CCR7 axis in HS pathogenesis suggests that other CCR7-expressing cells may also be recruited, contributing to disease progression. These findings significantly advance our understanding of HS pathogenesis offer promising avenues for future CCR7-targeted therapeutic interventions.

  6. Gene expression data sources for in silico approach to assessing activation...

    • springernature.figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvain Brohee; Amir Sonnenblick; David Venet (2023). Gene expression data sources for in silico approach to assessing activation of AKT/mTOR signalling pathway in ER-positive early Breast Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.7461776.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sylvain Brohee; Amir Sonnenblick; David Venet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.

    Dataset Data location Permanent identifier/url

    NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/

    UCSF GEO GSE123833

    STNO2 GEO GSE4335

    NCI Research Article (Supplementary files) 10.1073/pnas.1732912100

    UNC4 GEO GSE18229

    CAL Array Express E-TABM-158

    MDA4 GEO GSE123832

    KOO GEO GSE123831

    HLP Array Express E-TABM-543

    EXPO GEO GSE2109

    VDX GEO GSE2034/GSE5327

    MSK GEO GSE2603

    UPP GEO GSE3494

    STK GEO GSE1456

    UNT GEO GSE2990

    DUKE GEO GSE3143

    TRANSBIG GEO GSE7390

    DUKE2 GEO GSE6961

    MAINZ GEO GSE11121

    LUND2 GEO GSE5325

    LUND GEO GSE5325

    FNCLCC GEO GSE7017

    EMC2 GEO GSE12276

    MUG GEO GSE10510

    NCCS GEO GSE5364

    MCCC GEO GSE19177

    EORTC10994 GEO GSE1561

    DFHCC GEO GSE19615

    DFHCC2 GEO GSE18864

    DFHCC3 GEO GSE3744

    DFHCC4 GEO GSE5460

    MAQC2 GEO GSE20194

    TAM GEO GSE6532/GSE9195

    MDA5 GEO GSE17705

    VDX3 GEO GSE12093

    METABRIC EGA EGAS00000000083

    TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

    DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713

  7. h

    GPL570

    • huggingface.co
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michal Winnicki (2024). GPL570 [Dataset]. https://huggingface.co/datasets/mwinn99/GPL570
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 5, 2024
    Authors
    Michal Winnicki
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Original, raw data can be found in Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/

      Citation
    

    Winnicki MJ, Brown CA, Porter HL, Giles CB, Wren JD, BioVDB: biological vector database for high-throughput gene expression meta-analysis, Frontiers in Artificial Intelligence 7 (2024) https://www.frontiersin.org/articles/10.3389/frai.2024.1366273

  8. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Authors
    Hsu, Jonathan; Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  9. Z

    GEO gene expression dataset recompute for selected tumor samples

    • data.niaid.nih.gov
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visentin, Luca (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817923
    Explore at:
    Dataset updated
    May 13, 2024
    Dataset provided by
    University of Turin
    Authors
    Visentin, Luca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

    All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

    Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

    Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:

    geo_accession: The GEO sample ID of the sample.

    ena_sample: The ENA sample ID of the sample.

    ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

    The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.

    Pipeline Details

    The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.

    Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

  10. Public gene expression profile datasets used in this study.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad J. Creighton (2023). Public gene expression profile datasets used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0001816.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chad J. Creighton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SMD–Stanford Microarray Database (http://genome-www5.stanford.edu)GEO–Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/)Broad Institute (http://www.broad.mit.edu/egi-bin/cancer/datasets.cgi)Oncomine (www.oncomine.org)

  11. GDS4399

    • kaggle.com
    zip
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bassam165 (2025). GDS4399 [Dataset]. https://www.kaggle.com/datasets/bassam165/gds4399
    Explore at:
    zip(11496559 bytes)Available download formats
    Dataset updated
    Oct 26, 2025
    Authors
    Bassam165
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains microarray-based gene expression profiles of granulosa cells collected from women diagnosed with Polycystic Ovary Syndrome (PCOS) and from healthy controls. It originates from the NCBI GEO DataSet GDS4399, which was generated to study the molecular mechanisms underlying PCOS pathogenesis and its relationship to insulin resistance, steroidogenesis, and oocyte maturation.

    The data were collected using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570 platform). Each sample corresponds to an RNA expression profile of granulosa cells isolated from ovarian aspirates of PCOS and non-PCOS women undergoing in-vitro fertilization (IVF).

    Key Details

    NCBI GEO Accession: GDS4399

    Source: Gene Expression Omnibus (GEO), NCBI. GEO Accession: GDS4399 Title: Polycystic ovary syndrome: granulosa cells Platform: Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) Authors: Wood JR, et al. (Original study contributors) National Center for Biotechnology Information, U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

    Recommended citation style (IEEE): [1] J. R. Wood et al., “Polycystic ovary syndrome: granulosa cells,” Gene Expression Omnibus (GEO), GDS4399, NCBI, Bethesda, MD, USA. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399

    License: This dataset is part of the public NCBI GEO database and is distributed under the Public Domain / CC0 License for research and educational use. Please cite the original GEO entry when reusing this dataset.

  12. f

    Details of the data sources from Gene Expression Omnibus(GEO) for this...

    • datasetcatalog.nlm.nih.gov
    Updated Aug 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luo, Huan-Min; Xie, Li-Min; Ma, Yu-Wen; Guo, Xu-Guang; Su, Jian-Wen; Liu, Ye-Ling; Yin, Xin; Bi, Jie; Cao, Xun-Jie; Lin, Geng-Ling (2021). Details of the data sources from Gene Expression Omnibus(GEO) for this study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000833218
    Explore at:
    Dataset updated
    Aug 4, 2021
    Authors
    Luo, Huan-Min; Xie, Li-Min; Ma, Yu-Wen; Guo, Xu-Guang; Su, Jian-Wen; Liu, Ye-Ling; Yin, Xin; Bi, Jie; Cao, Xun-Jie; Lin, Geng-Ling
    Description

    Details of the data sources from Gene Expression Omnibus(GEO) for this study.

  13. o

    Repository for the single cell RNA sequencing data analysis for the human...

    • explore.openaire.eu
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
    Explore at:
    Dataset updated
    Aug 26, 2023
    Authors
    Jonathan; Andrew; Pierre; Allart; Adrian
    Description

    This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.

  14. f

    Dataset information from the GEO database.

    • datasetcatalog.nlm.nih.gov
    Updated Dec 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fu, Yanchi; Tian, Qinghua; Kong, Xiaotong; Wang, Jianjian; He, Yijie; Wang, Lihua; Chen, Lixia; Xin, Guanghao; Zhang, Huixue (2023). Dataset information from the GEO database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001099020
    Explore at:
    Dataset updated
    Dec 21, 2023
    Authors
    Fu, Yanchi; Tian, Qinghua; Kong, Xiaotong; Wang, Jianjian; He, Yijie; Wang, Lihua; Chen, Lixia; Xin, Guanghao; Zhang, Huixue
    Description

    Parkinson’s disease is the second most common neurodegenerative disease in the world. We downloaded data on Parkinson’s disease and Ferroptosis-related genes from the GEO and FerrDb databases. We used WCGAN and Random Forest algorithm to screen out five Parkinson’s disease ferroptosis-related hub genes. Two genes were identified for the first time as possibly playing a role in Braak staging progression. Unsupervised clustering analysis based on hub genes yielded ferroptosis isoforms, and immune infiltration analysis indicated that these isoforms are associated with immune cells and may represent different immune patterns. FRHGs scores were obtained to quantify the level of ferroptosis modifications in each individual. In addition, differences in interleukin expression were found between the two ferroptosis subtypes. The biological functions involved in the hub gene are analyzed. The ceRNA regulatory network of hub genes was mapped. The disease classification diagnosis model and risk prediction model were also constructed by applying hub genes based on logistic regression. Multiple external datasets validated the hub gene and classification diagnostic model with some accuracy. This study explored hub genes associated with ferroptosis in Parkinson’s disease and their molecular patterns and immune signatures to provide new ideas for finding new targets for intervention and predictive biomarkers.

  15. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jul 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
    Explore at:
    zip(23319014182 bytes)Available download formats
    Dataset updated
    Jul 25, 2021
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  16. Taxol Drug Resistance cell lines in Breast Cancer

    • kaggle.com
    zip
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Abedi Madiseh (2023). Taxol Drug Resistance cell lines in Breast Cancer [Dataset]. https://www.kaggle.com/datasets/aliabedimadiseh/taxol-drug-resistance-cell-lines-in-breast-cancer/discussion
    Explore at:
    zip(247688 bytes)Available download formats
    Dataset updated
    Apr 12, 2023
    Authors
    Ali Abedi Madiseh
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset collected from NCBI - GEO datasets: - GSE144113 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144113) - GSE76200 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76200) - GSE12791 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12791)

    These datasets include four paclitaxel-resistant cell lines which includes BAS, HS578T, MCF7 and MDA-MB-231.

    Gene expression analysis was performed using R in each of the datasets, which was between control cells and drug-resistant cells. And using different Bioinformatics databases, they were converted into gene symbols. Genes with a p-value of less than 0.05 were also removed.

  17. d

    Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R...

    • datadryad.org
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Habermann; Margaux Haering (2021). Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis [Dataset]. http://doi.org/10.5061/dryad.8pk0p2nnd
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2021
    Dataset provided by
    Dryad
    Authors
    Bianca Habermann; Margaux Haering
    Time period covered
    Jul 6, 2021
    Description

    Details on data processing and analysis can be found in the associated article.

  18. f

    Screening of the GEO database led to selection of five gene expression...

    • datasetcatalog.nlm.nih.gov
    Updated Oct 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huang, Guanli; Shen, Xian; Xue, Xiangyang; Mao, Chenchen; Ye, Sisi; Guo, Gangqiang; Hu, Yingying; Hu, Changyuan; Guo, Aizhen; Zhang, Liang; Sun, Xiangwei; Xu, Jianfeng (2016). Screening of the GEO database led to selection of five gene expression microarrays for colorectal cancer (requirements: cancer tissues and their adjacent normal tissues, at least 10 samples per group). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001577704
    Explore at:
    Dataset updated
    Oct 28, 2016
    Authors
    Huang, Guanli; Shen, Xian; Xue, Xiangyang; Mao, Chenchen; Ye, Sisi; Guo, Gangqiang; Hu, Yingying; Hu, Changyuan; Guo, Aizhen; Zhang, Liang; Sun, Xiangwei; Xu, Jianfeng
    Description

    Screening of the GEO database led to selection of five gene expression microarrays for colorectal cancer (requirements: cancer tissues and their adjacent normal tissues, at least 10 samples per group).

  19. Medulloblastoma omics data

    • kaggle.com
    zip
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2023). Medulloblastoma omics data [Dataset]. https://www.kaggle.com/alexandervc/medulloblastoma-omics-data
    Explore at:
    zip(2278448493 bytes)Available download formats
    Dataset updated
    Feb 22, 2023
    Authors
    Alexander Chervov
    Description

    Collection of gene expression and similar datasets related to brain tumors. In particular Medulloblastoma. Medulloblastoma is the most common malignant brain tumor in childhood. Typically csv files genes x samples.

    GSE124814 WOW! Integration of many (all?) medulloblastoma datasets(!): 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain

    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124814 Weishaupt H, Johansson P, Sundström A, Lubovac-Pilav Z et al. Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes. Bioinformatics 2019 Sep 15;35(18):3357-3364. PMID: 30715209 https://doi.org/10.1093/bioinformatics/btz066 We downloaded a total of 1796 CEL files from previously published GEO or ArrayExpress records: GSE85217(n=763), GSE25219(n=154), GSE60862(n=130), GSE12992(n=40), GSE67850(n=22), GSE10327(n=62), GSE30074(n=30), E-MTAB-292(n=19), GSE74195(n=30), GSE37418(n=76), GSE4036(n=14), GSE62803(n=52), GSE21140(n=103), GSE37382(n=50), GSE22569(n=24), GSE35974(n=50), GSE73038(n=46), GSE50161(n=24), GSE3526(n=9), GSE50765(n=12), GSE49243(n=58), GSE41842(n=19), GSE44971(n=9). After preprocessing of all CEL files, we averaged the expression profiles of samples that mapped to the same patient in a single dataset, producing a final expression array comprising 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain (cerebellum/upper rhombic lip). Also discussed in paper: A transcriptome-based classifier to determine molecular subtypes in medulloblastoma https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008263

    GSE85217 (Cavalli ... Taylor ) 768 samples 2016 ( Affimetrix Human Gene 1.1 ST Array ) Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654 Ramaswamy V, Taylor MD. Bioinformatic Strategies for the Genomic and Epigenomic Characterization of Brain Tumors. Methods Mol Biol 2019;1869:37-56. PMID: 30324512 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85217

    GSE202043 (Pomeroy) 214 samples, 2011 (Expression profiling by array) Cho YJ, Tsherniak A, Tamayo P, Santagata S et al. Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J Clin Oncol 2011 Apr 10;29(11):1424-30. PMID: 21098324 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202043

    GSE12992 (Fattet ... Delattre) 72 samples, 2009 (Expression profiling by array) Fattet S, Haberler C, Legoix P, Varlet P et al. Beta-catenin status in paediatric medulloblastomas: correlation of immunohistochemical expression with mutational status, genetic profiles, and clinical characteristics. J Pathol 2009 May;218(1):86-94. PMID: 19197950 A series of 72 pediatric medulloblastoma tumors has been studied at the genomic level (array-CGH), screened for CTNNB1 mutations and beta-catenin expression (immunohistochemistry). A subset of 40 tumor samples has been analyzed at the RNA expression level (Affymetrix HG U133 Plus 2.0). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12992

    GSE37382 (Northcott ... Taylor) 2012 (Expression profiling by array, Affymetrix Human Gene 1.1 ST Array profiling of 285 primary medulloblastoma samples.) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37382

    GSE10327 (M. Kool ) 62 samples, 2008 ( Expression profiling by array ) (beware it is sometimes referred as GSE10237 in original paper and several references - that is an error reference). Kool M, Koster J, Bunt J, Hasselt NE et al. Integrated genomics identifies five medulloblastoma subtypes with distinct genetic profiles, pathway signatures and clinicopathological features. PLoS One 2008 Aug 28;3(8):e3088. PMID: 18769486 Rack PG, Ni J, Payumo AY, Nguyen V et al. Arhgap36-dependent activation of Gli transcription factors. Proc Natl Acad Sci U S A 2014 Jul 29;111(30):11061-6. PMID: 25024229 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10327

    Other datasets (not yet loaded):

    (47.1 Gb, 2012) (Expression profiling by array, Genome variation profiling by SNP array, SNP genotyping by SNP array ) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 Here we report somatic copy number aberrations (SCNAs) in 1087 unique medulloblastomas. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37385

  20. Lee2020 GSE132465 Primary Colorectal Cancer Dataset for Besca

    • zenodo.org
    bin
    Updated Jul 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Schwalie; Petra Schwalie (2020). Lee2020 GSE132465 Primary Colorectal Cancer Dataset for Besca [Dataset]. http://doi.org/10.5281/zenodo.3967538
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Petra Schwalie; Petra Schwalie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The gene expression matrix was downloaded from GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132465), originally published by Lee HO, Hong Y, Etlioglu HE, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet. 2020;52(6):594-603. doi:10.1038/s41588-020-0636-z. We reprocessed the dataset using the Besca package (https://github.com/bedapub/besca).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dai, Minchen; Xie, Ningning; Fu, Leyi; Zhang, Songying; Jiang, Zhou; Wang, Fangfang; Zhou, Jue; Qu, Fan (2023). Gene expression data from Gene Expression Omnibus (GEO) database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001040576

Gene expression data from Gene Expression Omnibus (GEO) database.

Explore at:
Dataset updated
Mar 1, 2023
Authors
Dai, Minchen; Xie, Ningning; Fu, Leyi; Zhang, Songying; Jiang, Zhou; Wang, Fangfang; Zhou, Jue; Qu, Fan
Description

Gene expression data from Gene Expression Omnibus (GEO) database.

Search
Clear search
Close search
Google apps
Main menu