40 datasets found

d
Data from: Gene Expression Omnibus (GEO)
dknet.org
rrid.site
+2more
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Gene Expression Omnibus (GEO) [Dataset]. http://identifiers.org/RRID:SCR_005012
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005012
Dataset updated
Jul 31, 2025
Description
Functional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.
e
GEO DataSets
ebi.ac.uk
Updated Dec 1, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). GEO DataSets [Dataset]. https://www.ebi.ac.uk/ebisearch/data-coverage
Explore at:
Dataset updated
Dec 1, 2015
Description
Gene Expression Omnibus. GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. The GEO DataSets database stores original submitter-supplied records (Series, Samples and Platforms) as well as curated DataSets.
Field-wide assessment of differential HT-seq from NCBI GEO database
zenodo.org
application/gzip
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.5356064
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5356064
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

- This release includes GEO series up to Dec-31, 2020;

- Fixed xlrd missing optional dependency, which affected import of some xls files, previously we were using only openpyxl (thanks to anonymous reviewer);

- All files in supplementary _RAW.tar files were checked for p values, previously _RAW.tar files were completely omitted, alas (thanks to anonymous reviewer).

Archived dataset contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series

- output/publications.csv, publication info of NCBI GEO series

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/single-cell.csv, single cell experiments

- spots.csv, NCBI SRA sequencing run metadata

- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.

- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
r
Entrez GEO Profiles
rrid.site
dknet.org
+2more
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Entrez GEO Profiles [Dataset]. http://identifiers.org/RRID:SCR_004584
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004584
Dataset updated
Jul 11, 2025
Description
The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.
Field-wide assessment of differential HT-seq from NCBI GEO database
zenodo.org
application/gzip
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.5068928
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5068928
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analyzed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository. Our work puts an upper bound of 62% to field-wide reproducibility, based on the types of files submitted to GEO.

Archived dataset contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series

- output/publications.csv, publication info of NCBI GEO series

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/single-cell.csv, single cell experiments

- spots.csv, NCBI SRA sequencing run metadata

- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.

- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
GEO gene expression dataset recompute for selected tumor samples
zenodo.org
application/gzip
Updated Mar 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Visentin; Luca Visentin (2024). GEO gene expression dataset recompute for selected tumor samples [Dataset]. http://doi.org/10.5281/zenodo.10817924
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10817924
Dataset updated
Mar 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luca Visentin; Luca Visentin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, `.csv`-formatted matrices. The `*_expected_count.csv.gz` files are unlogged, raw expression counts as reported by `rsem-quantify-expression` (see details below). The associated `*_metadata.csv.gz` files contain metadata pertinent to each column of the corresponding expression matrix.
Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the `gene_id` column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.
Each associated metadata has at least the following columns:
- `geo_accession`: The GEO sample ID of the sample.
- `sample_accession`: The ENA sample ID of the sample.
- `run_accession`: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

## Pipeline Details

The alignment and quantification was made with the `x.FASTQ` tool available [on Github](https://github.com/TCP-Lab/x.FASTQ) installed locally on an Arch Linux machine running the Linux `6.7.8-zen1-1-zen` kernel with a `11th Gen Intel i7-1185G7 (8)` CPU and a `Intel TigerLake-LP GT2 [Iris Xe Graphics]` GPU.
o
Immunological Genome Project data Phase 1
omicsdi.org
xml
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Yang, Immunological Genome Project data Phase 1 [Dataset]. https://www.omicsdi.org/dataset/geo/GSE15907
Explore at:
xmlAvailable download formats
Authors
Liang Yang
Variables measured
Genomics
Description
Gene-expression microarray datasets generated as part of the Immunological Genome Project (ImmGen). Primary cells from multiple immune lineages are isolated ex-vivo, primarily from young adult B6 male mice, and double-sorted to >99% purity. RNA is extracted from cells in a centralized manner, amplified and hybridized to Affymetrix 1.0 ST MuGene arrays. Protocols are rigorously standardized for all sorting and RNA preparation. Data is released monthly in batches of cell populations. Overall design: This Series record provides access to Immunological Genome Project data submitted to GEO.
f
Data from: Metadata record for the manuscript: FOXA1 and adaptive response...
springernature.figshare.com
xlsx
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven P. Angus; Timothy J. Stuhlmiller; Gaurav Mehta; Samantha M. Bevill; Daniel R. Goulet; J. Felix Olivares-Quintero; Michael P. East; Maki Tanioka; Jon S. Zawistowski; Darshan Singh; Noah Sciaky; Xin Chen; Xiaping He; Naim U. Rashid; Lynn Chollet-Hinton; Cheng Fan; Matthew G. Soloway; Patricia A. Spears; Stuart Jefferys; Joel S. Parker; Kristalyn K. Gallagher; Andres Forero-Torres; Ian E. Krop; Alastair M. Thompson; Rashmi Murthy; Michael L. Gatza; Charles M. Perou; H. Shelton Earp; Lisa A. Carey; Gary L. Johnson (2024). Metadata record for the manuscript: FOXA1 and adaptive response determinants to HER2 targeted therapy in TBCRC 036 [Dataset]. http://doi.org/10.6084/m9.figshare.14376746.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14376746.v1
Dataset updated
Feb 14, 2024
Dataset provided by
figshare
Authors
Steven P. Angus; Timothy J. Stuhlmiller; Gaurav Mehta; Samantha M. Bevill; Daniel R. Goulet; J. Felix Olivares-Quintero; Michael P. East; Maki Tanioka; Jon S. Zawistowski; Darshan Singh; Noah Sciaky; Xin Chen; Xiaping He; Naim U. Rashid; Lynn Chollet-Hinton; Cheng Fan; Matthew G. Soloway; Patricia A. Spears; Stuart Jefferys; Joel S. Parker; Kristalyn K. Gallagher; Andres Forero-Torres; Ian E. Krop; Alastair M. Thompson; Rashmi Murthy; Michael L. Gatza; Charles M. Perou; H. Shelton Earp; Lisa A. Carey; Gary L. Johnson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary

This metadata record provides details of the data supporting the claims of the related manuscript: “FOXA1 and adaptive response determinants to HER2 targeted therapy in TBCRC 036”.

The related study aimed to determine the global alterations in gene enhancers and transcriptional changes to identify factors involved in the adaptive response to HER2 inhibition. In parallel, it analysed the in vivo human adaptive molecular responses to HER2 targeting in a window-of-opportunity clinical trial using both RNAseq and a chemical proteomics method (MIB/MS) to assess the functional kinome.

Type of data: mass spectrometry proteomics data; normalised patient RNA sequencing data; cell line RNA sequencing data; cell line ChIPseq data

Subject of data: Homo sapiens; Eukaryotic cell lines

Recruitment: Eligible women included those with newly diagnosed Stage I-IV HER2+ breast cancer scheduled to undergo definitive surgery (either lumpectomy or mastectomy). Stage I-IIIc patients could not be candidates for a therapeutic neoadjuvant treatment. Study subjects provided informed written consent that included details of the nontherapeutic nature of the trial.

Trial registration number: https://clinicaltrials.gov/ct2/show/NCT01875666

Data access

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier https://identifiers.org/pride.project:PXD021865.

Normalized patient RNAseq data (https://identifiers.org/geo:GSE161743), cell line RNAseq (https://identifiers.org/geo:GSE160001 and https://identifiers.org/geo:GSE160001), and cell line ChIPseq (https://identifiers.org/geo:GSE160667) are all part of the SuperSeries https://identifiers.org/geo:GSE160670 available through the Gene Expression Omnibus.

Processed and normalized data are provided as supplemental materials associated with the article on the journal website, and also attached to this data record in the Excel spreadsheets called Supplementary Data 1-10 and the PDF called Supplementary material file.PDF. Accompanying Supplementary Information and Supplementary Data files contain relevant data used to produce the included figures and are available with this article. A detailed list of which data files underlie which figures and tables in the related article is included in the file ‘Angus_et_al_2021_underlying_data_files_list.xlsx’, which is shared with this data record.

The data supporting Figure 3c is in the GraphPad Prism file called ‘siGrowth’, which is not shared publicly as it is in a non-open format, but it can be made available upon reasonable request to the corresponding author.

Corresponding author(s) for this study

Gary L. Johnson, PhD, Department of Pharmacology, 4079 Genetic Medicine Building, University of North Carolina School of Medicine, Chapel Hill, NC 27599. Email: glj@med.unc.edu. Phone: 919-843-3106.

Study approval

Approved by the UNC Office of Human Research Ethics and conducted in accordance with the Declaration of Helsinki. IRB# 13-1826
A field-wide assessment of differential RNAseq reveals ubiquitous bias
zenodo.org
application/gzip
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). A field-wide assessment of differential RNAseq reveals ubiquitous bias [Dataset]. http://doi.org/10.5281/zenodo.3778160
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3778160
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analyzed the field of expression profiling by high throughput sequencing, or RNA-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository. Our work puts an upper bound of 56% to field-wide reproducibility, based on the types of files submitted to GEO.

Archived dataset contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of GEO series

- output/publications.csv, publication info of GEO series

- output/scopus_citedbycount.csv, Scopus citation info of GEO series

- output/single-cell.csv, single cell experiments

- spots.csv, sequencing run metadata: number of spots and bases

- suppfilenames.txt, list of all supplementary file names of GEO submissions. One filename per row.

- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
f
Metadata record for the manuscript: A tumor microenvironment specific gene...
springernature.figshare.com
xls
Updated Feb 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoqiang Zhu; Xianglong Tian; Linhua Ji; Xinyu Zhang; Yingying Cao; Chaoqin Shen; Ye Hu; Jason W. H. Wong; Jing-Yuan Fang; Jie Hong; Haoyan Chen (2024). Metadata record for the manuscript: A tumor microenvironment specific gene expression signature predicts chemotherapy resistance in colorectal cancer patients [Dataset]. http://doi.org/10.6084/m9.figshare.13027715.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13027715.v1
Dataset updated
Feb 27, 2024
Dataset provided by
figshare
Authors
Xiaoqiang Zhu; Xianglong Tian; Linhua Ji; Xinyu Zhang; Yingying Cao; Chaoqin Shen; Ye Hu; Jason W. H. Wong; Jing-Yuan Fang; Jie Hong; Haoyan Chen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary

This metadata record provides details of the data supporting the claims of the related manuscript “A tumor microenvironment specific gene expression signature predicts chemotherapy resistance in colorectal cancer patients”.

The related study aimed to determine whether used tumor microenvironment (TME) specific gene signature to identify colorectal cancer (CRC) subtypes with distinctive clinical relevance was possible.

Data access

The data analysed during the related study were downloaded from public databases including Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and The Cancer Genome Atlas (TCGA; TCGA CRC datasets available from the Synapse repository at: https://www.synapse.org/#!Synapse:syn2623706/files/). For a list of accession IDs for the analysed data, see Supplementary Table S1 of the manuscript, also included as part of this metadata record. The Renji RNA-seq data is available from GEO: https://identifiers.org/geo:GSE158559.

The output data of the related study are included with this data record, and are as follows:- Table S1 to S10 - supplementary tables 1 to 10 for the related manuscript- Cetuximab_GSE5851.PRJEB34338.combined.Rdata - two combined CRC Cetuximab treated gene expression matrix- combined_five_GEObatch_GSE14333_GSE17536_GSE17537_GSE33113_GSE37892.Rdata - five combined CRC gene expression matrix- FOLFOX_GSE19860_GSE28702_GSE69675.Rdata - three combined CRC FOLFOX treated gene expression matrix- FOLFOX_GSE104645_GSE72970.Rdata - two combined CRC FOLFOX or FOLFIRI treated gene expression matrix- GSE39395.expMatrix.Rdata - GSE39395 gene expression matrix- GSE39396.expMatrix.Rdata - GSE39396 gene expression matrix- GSE39582_after_ComBat.Rdata - GSE39582 gene expression matrix- GSE62080_exp_pdata.Rdata - GSE62080 gene expression matrix- GSE72056.melanoma.sfm.signature.rds - scRNA melanoma processed data- GSE75688.BRCA.sfm.signature.rds - scRNA breast cancer processed data- GSE81861.sfm.signature.rds - scRNA CRC processed data- GSE103322.head-neck.sfm.signature.rds - scRNA head and neck processed data- TCGA.CRC.expMatrix.Rdata - TCGA CRC gene expression matrix- TCGA.CRC.microbiome.abundance.Rdata - TCGA CRC gut microbiome abundance
e
Land Surface Temperature Data Record - MSG
navigator.eumetsat.int
data.eumetsat.int
+2more
Updated Jan 21, 2004
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSA SAF (2004). Land Surface Temperature Data Record - MSG [Dataset]. https://navigator.eumetsat.int/product/EO:EUM:DAT:0088
Explore at:
Dataset updated
Jan 21, 2004
Dataset authored and provided by
LSA SAF
Measurement technique
Optical
Description
The full archive of MSG/SEVIRI data was reprocessed to provide the user community a consistent, homogeneous and continuous Data Record of the 15-min Land Surface Temperature (LST) for the period 2004-2015. This Data Record was obtained with the best version of its equivalent NRT product (MLST) which can also complement the time series from 2016 onwards.
Species biodiversity transnational geo-database - IMPRECO Project
gbif.org
demo.gbif.org
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Zangaro; Gabriele Marini; Valeria Specchia; Matteo De Luca; Francesca Visintin; Giovanna Bullo; Jacopo Richard; Nataša Šalaja; Bia Rakar; Bojana Lipej; Jelena Kurtović Mrčelić; Gvido Piasevoli; Ante Žuljević; Nada Zaimi; Djana Bejko; Abdulla Diku; Aliki Karousou; Eleni Hatziyanni; Massimiliano Pinat; Maurizio Pinna; Francesco Zangaro; Gabriele Marini; Valeria Specchia; Matteo De Luca; Francesca Visintin; Giovanna Bullo; Jacopo Richard; Nataša Šalaja; Bia Rakar; Bojana Lipej; Jelena Kurtović Mrčelić; Gvido Piasevoli; Ante Žuljević; Nada Zaimi; Djana Bejko; Abdulla Diku; Aliki Karousou; Eleni Hatziyanni; Massimiliano Pinat; Maurizio Pinna (2025). Species biodiversity transnational geo-database - IMPRECO Project [Dataset]. http://doi.org/10.15468/pghr6g
Explore at:
Unique identifier
https://doi.org/10.15468/pghr6g
Dataset updated
Mar 24, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Biodiversity Data Journal
Authors
Francesco Zangaro; Gabriele Marini; Valeria Specchia; Matteo De Luca; Francesca Visintin; Giovanna Bullo; Jacopo Richard; Nataša Šalaja; Bia Rakar; Bojana Lipej; Jelena Kurtović Mrčelić; Gvido Piasevoli; Ante Žuljević; Nada Zaimi; Djana Bejko; Abdulla Diku; Aliki Karousou; Eleni Hatziyanni; Massimiliano Pinat; Maurizio Pinna; Francesco Zangaro; Gabriele Marini; Valeria Specchia; Matteo De Luca; Francesca Visintin; Giovanna Bullo; Jacopo Richard; Nataša Šalaja; Bia Rakar; Bojana Lipej; Jelena Kurtović Mrčelić; Gvido Piasevoli; Ante Žuljević; Nada Zaimi; Djana Bejko; Abdulla Diku; Aliki Karousou; Eleni Hatziyanni; Massimiliano Pinat; Maurizio Pinna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2018 - Jan 31, 2021
Area covered

Description
Transnational biodiversity geo-database of the Protected Areas in the Adriatic-Ionian Macro-Region - IMPRECO Project.
GEO-Hauptveranstaltung in "Wildtierland"
gbif.org
demo.gbif.org
Updated Mar 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GEO-Tag der Artenvielfalt (2018). GEO-Hauptveranstaltung in "Wildtierland" [Dataset]. http://doi.org/10.15468/ebnnbs
Explore at:
Unique identifier
https://doi.org/10.15468/ebnnbs
Dataset updated
Mar 9, 2018
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
GEO-Tag der Artenvielfalt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset holds the observations recorded during the GEO Biodiversity Day "GEO-Hauptveranstaltung in "Wildtierland"" in Strasburg (Uckermark)
Z
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
data.niaid.nih.gov
zenodo.org
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
Explore at:
Dataset updated
Nov 20, 2023
Dataset provided by
Stoop, Allart
Hsu, Jonathan
Description
Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
h
GeoEDdA
huggingface.co
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GEODE (2024). GeoEDdA [Dataset]. https://huggingface.co/datasets/GEODE/GeoEDdA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2024
Dataset authored and provided by
GEODE
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
GeoEDdA: A Gold Standard Dataset for Geo-semantic Annotation of Diderot & d’Alembert’s Encyclopédie

Dataset Description

Authors: Ludovic Moncla, Katherine McDonough and Denis Vigier in the framework of the GEODE project. Data source: ARTFL Encyclopédie Project, University of Chicago Github repository: https://github.com/GEODE-project/ner-spancat-edda Language: French License: cc-by-nc-4.0 Zenodo repository: https://zenodo.org/records/10530177

Dataset Summary… See the full description on the dataset page: https://huggingface.co/datasets/GEODE/GeoEDdA.
d
Geoscape Geocoded National Address File (G-NAF)
data.gov.au
researchdata.edu.au
pdf, zip
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Industry, Science and Resources (DISR) (2025). Geoscape Geocoded National Address File (G-NAF) [Dataset]. https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf
Explore at:
pdf, zip(1685801192), zip(1689613051), pdf(398940)Available download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Department of Industry, Science and Resources (DISR)
Description
Geoscape G-NAF is the geocoded address database for Australian businesses and governments. It’s the trusted source of geocoded address data for Australia with over 50 million contributed addresses distilled into 15.4 million G-NAF addresses. It is built and maintained by Geoscape Australia using independently examined and validated government data.

From 22 August 2022, Geoscape Australia is making G-NAF available in an additional simplified table format. G-NAF Core makes accessing geocoded addresses easier by utilising less technical effort.

G-NAF Core will be updated on a quarterly basis along with G-NAF.

Further information about contributors to G-NAF is available here.

With more than 15 million Australian physical address record, G-NAF is one of the most ubiquitous and powerful spatial datasets. The records include geocodes, which are latitude and longitude map coordinates. G-NAF does not contain personal information or details relating to individuals.

Updated versions of G-NAF are published on a quarterly basis. Previous versions are available here

Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.

Changes in the May 2025 release

Nationally, the May 2025 update of G-NAF shows an overall increase of 47,194 addresses (0.30%). The total number of addresses in G-NAF now stands at 15,753,927 of which 14,909,770 or 94.64% are principal.

At some locations, there are unit-numbered addresses that appear to be duplicate addresses. Geoscape is working to identify these locations and include these addresses as separate addresses in G-NAF. To facilitate this process, some secondary addresses have had the word RETAIL added to their building names. In the first instance, this process is being progressively rolled out to identified locations, but it is expected that the requirement for this will become ongoing.

There is one new locality in G-NAF: Keswick Island, QLD.

The source data used for generating G-NAF STREET_LOCALITY_POINT data in New South Wales has an updated datum and changed from GDA94 to GDA2020. This has resulted in updates to the STREET_LOCALITY_POINT geometry for approximately 91,000 records, however, more than 95% of these have moved less than a metre.

Geoscape has moved product descriptions, guides and reports online to https://docs.geoscape.com.au.

Further information on G-NAF, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on G-NAF, including software solutions, consultancy and support.

Additional information: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.

License Information

Use of the G-NAF downloaded from data.gov.au is subject to the End User Licence Agreement (EULA)

The EULA terms are based on the Creative Commons Attribution 4.0 International license (CC BY 4.0). However, an important restriction relating to the use of the open G-NAF for the sending of mail has been added.

The open G-NAF data must not be used for the generation of an address or the compilation of an address for the sending of mail unless the user has verified that each address to be used for the sending of mail is capable of receiving mail by reference to a secondary source of information. Further information on this use restriction is available here.

End users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).

Users must also note the following attribution requirements:

Preferred attribution for the Licensed Material:

_G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the _Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

Preferred attribution for Adapted Material:

Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

What to Expect When You Download G-NAF

G-NAF is a complex and large dataset (approximately 5GB unpacked), consisting of multiple tables that will need to be joined prior to use. The dataset is primarily designed for application developers and large-scale spatial integration. Users are advised to read the technical documentation, including product change notices and the individual product descriptions before downloading and using the product. A quick reference guide on unpacking the G-NAF is also available.
Expression Data recompute of selected GEO-deposited RNA-Seq data of HMEC-1...
zenodo.org
application/gzip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Visentin; Luca Visentin (2025). Expression Data recompute of selected GEO-deposited RNA-Seq data of HMEC-1 cell lines [Dataset]. http://doi.org/10.5281/zenodo.14793942
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14793942
Dataset updated
Feb 3, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luca Visentin; Luca Visentin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We aligned and quantified RNA-Seq data present in GEO regarding HMEC-1 cell lines with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression with the 'expected counts' feature. The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.
Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.
Each associated metadata has at least the following columns:

geo_sample: The GEO sample ID of the sample.

geo_series: The GEO series ID of the sample.

ena_sample: The ENA sample ID of the sample.

ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information (https://github.com/TCP-Lab/x.FASTQ).

Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.

The different datasets where generated over a long period of time trough a variety of different versions of x.FASTQ. However, the versions of the softwares that acted on the files themselves (e.g. STAR, rsem, etc...) were unchanged, and reported below:
d
Geolytica POIData.xyz Points of Interest (POI) Geo Data - Austria
datarade.ai
.csv
Updated Sep 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2021). Geolytica POIData.xyz Points of Interest (POI) Geo Data - Austria [Dataset]. https://datarade.ai/data-products/geolytica-poidata-xyz-points-of-interest-poi-geo-data-aus-geolytica-c5a9
Explore at:
.csvAvailable download formats
Dataset updated
Sep 20, 2021
Dataset authored and provided by
Geolytica
Area covered
Austria
Description
https://store.poidata.xyz/at

Point-of-interest (POI) is defined as a physical entity (such as a business) in a geo location (point) which may be (of interest).

We strive to provide the most accurate, complete and up to date point of interest datasets for all countries of the world. The Austria POI Dataset is one of our worldwide POI datasets with over 98% coverage.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

POI Data is in a constant flux - especially so during times of drastic change such as the Covid-19 pandemic.

Every minute worldwide on an average day over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist.

In today's interconnected world, of the approximately 200 million POIs worldwide, over 94% have a public online presence. As a new POI comes into existence its information will appear very quickly in location based social networks (LBSNs), other social media, pictures, websites, blogs, press releases. Soon after that, our state-of-the-art POI Information retrieval system will pick it up.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via a recurring payment plan on our data update pipeline.

The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

The core attribute coverage is as follows:

Poi Field Data Coverage (%) poi_name 100 brand 9 poi_tel 58 formatted_address 100 main_category 97 latitude 100 longitude 100 neighborhood 19 source_url 54 email 5 opening_hours 40

The dataset may be viewed online at https://store.poidata.xyz/at and a data sample may be downloaded at https://store.poidata.xyz/datafiles/at_sample.csv
f
Data from: Retracted article: Functional analysis of ceRNA network of lncRNA...
tandf.figshare.com
figshare.com
docx
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiezhong Lin; Jianyi Zhou; Guiting Xie; Xiongwei Xie; Yanfang Luo; Jinguang Liu (2024). Retracted article: Functional analysis of ceRNA network of lncRNA TSIX/miR-34a-5p/RBP2 in acute myocardial infarction based on GEO database [Dataset]. http://doi.org/10.6084/m9.figshare.17031615.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17031615.v2
Dataset updated
Mar 1, 2024
Dataset provided by
Taylor & Francis
Authors
Jiezhong Lin; Jianyi Zhou; Guiting Xie; Xiongwei Xie; Yanfang Luo; Jinguang Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Jiezhong Lin, Jianyi Zhou, Guiting Xie, Xiongwei Xie, Yanfang Luo and Jinguang Liu. Functional analysis of ceRNA network of lncRNA TSIX/miR-34a-5p/RBP2 in acute myocardial infarction based on GEO database. 2021 Oct. doi: 10.1080/21655979.2021.2006865. Since publication, significant concerns have been raised about the compliance with ethical policies for human research and the integrity of the data reported in the article. When approached for an explanation, the authors provided some original data but were not able to provide all the necessary supporting information. As verifying the validity of published work is core to the scholarly record’s integrity, we are retracting the article. All authors listed in this publication have been informed. We have been informed in our decision-making by our editorial policies and the COPE guidelines. The retracted article will remain online to maintain the scholarly record, but it will be digitally watermarked on each page as ‘Retracted.’
n
Re-analysis of microarray data from rapamycin resistant DLBCL cell lines
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 22, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan Laurance (2013). Re-analysis of microarray data from rapamycin resistant DLBCL cell lines [Dataset]. http://doi.org/10.7272/Q6TD9V7J
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7272/Q6TD9V7J
Dataset updated
Oct 22, 2013
Authors
Megan Laurance
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset contains a re-analysis of the raw microarray data originally published by Petrich AM et al in 2012 (citation details are provided through the link to the GEO record). We were interested in re-analyzing the data because the list of differentially expressed genes that were identified when comparing rapamycin resistant DLBCL cell lines to rapamycin sensitive cell lines was not included in the original article or supplemental materials. We were interested in validating and expanding upon the findings from the original article by reevaluating the raw microarray data. Our reanalysis identified over 200 genes that were significantly differentially expressed between rapamycin resistant and sensitive cells. Importantly, our analysis highlighted a gene that was highly upregulated in rapamycin resistant cells, CD247, that was not the focus on the original publication, and is the target of a drug currently in clinical trials for refractory DLBCL. Our reanalysis also highlighted the role of SYK, a kinase upregulated in rapamycin resistant cell lines, that has a direct molecular relationship with CD247, and is a potential biomarker of drug response in DLBCL. Methods The methods used to generate the original microarray data are described in the GEO record where the data were originally published: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27255 We downloaded the raw Affymetrix data (CEL files) from GEO for renalysis. Statistical and quality analysis was performed using the data analysis pipeline in iReport (http://www.ingenuity.com/products/ireport) which utilizes packages such as RMA and Limma from Bioconductor, in the R programming language. A full description of the analysis packages used by this pipeline are included in the word document "GSE27255 stats and QC details" included in this DataShare record. Analysis of microarray data in iReport identified 229 differentially expressed genes (DEGs) with a p-value <0.05, fold change >1.5. These DEGs were then uploaded and analyzed in Ingenuity Pathway Analysis (www.ingenuity.com) for functional enrichment, pathway analysis, and drug target/biomarker analysis. Our novel findings with respect to the role of CD247 and SYK in rapamycin resistant DLBCL cell lines is depicted in the pathway image fine "Rap resist network with CD247" which is included in this DataShare record.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Gene Expression Omnibus (GEO) [Dataset]. http://identifiers.org/RRID:SCR_005012

Data from: Gene Expression Omnibus (GEO)

RRID:SCR_005012, nif-0000-00142, nlx_96903, OMICS_01030, SCR_007303, Gene Expression Omnibus (GEO) (RRID:SCR_005012), GEO, Gene Expression Omnibus (GEO), Entrez GEO DataSets, Gene Expression Data Sets, Gene Expression Omnibus, GEO, NCBI GEO DataSets, GEO DataSets, Gene Expression Omnibus DataSets

Explore at:

448 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_005012

Dataset updated

Jul 31, 2025

Description

Functional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.

Clear search

Close search

Google apps

Main menu

Data from: Gene Expression Omnibus (GEO)

GEO DataSets

Field-wide assessment of differential HT-seq from NCBI GEO database

Entrez GEO Profiles

Field-wide assessment of differential HT-seq from NCBI GEO database

GEO gene expression dataset recompute for selected tumor samples

Immunological Genome Project data Phase 1

Data from: Metadata record for the manuscript: FOXA1 and adaptive response...

A field-wide assessment of differential RNAseq reveals ubiquitous bias

Metadata record for the manuscript: A tumor microenvironment specific gene...

Land Surface Temperature Data Record - MSG

Species biodiversity transnational geo-database - IMPRECO Project

GEO-Hauptveranstaltung in "Wildtierland"

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

1. Main Description

File Descriptions

Linked Files

Installation and Instructions

GeoEDdA

Geoscape Geocoded National Address File (G-NAF)

License Information

What to Expect When You Download G-NAF

Expression Data recompute of selected GEO-deposited RNA-Seq data of HMEC-1...

Geolytica POIData.xyz Points of Interest (POI) Geo Data - Austria

Data from: Retracted article: Functional analysis of ceRNA network of lncRNA...

Re-analysis of microarray data from rapamycin resistant DLBCL cell lines

Data from: Gene Expression Omnibus (GEO)

RRID:SCR_005012, nif-0000-00142, nlx_96903, OMICS_01030, SCR_007303, Gene Expression Omnibus (GEO) (RRID:SCR_005012), GEO, Gene Expression Omnibus (GEO), Entrez GEO DataSets, Gene Expression Data Sets, Gene Expression Omnibus, GEO, NCBI GEO DataSets, GEO DataSets, Gene Expression Omnibus DataSets