Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.
All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression (see details below). The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).
Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.
Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns:
geo_accession: The GEO sample ID of the sample.
ena_sample: The ENA sample ID of the sample.
ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.
The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information.
Pipeline Details
The alignment and quantification was made with the x.FASTQ tool available on Github installed locally on an Arch Linux machine on commit 3a93dd77a70df59c74f7b15216c26f12cd918e81 running the Linux 6.7.8-zen1-1-zen kernel with a 11th Gen Intel i7-1185G7 (8) CPU and a Intel TigerLake-LP GT2 [Iris Xe Graphics] GPU. Please note that no sample filtering or omissions were done based on sample quality or sequencing depth. However, sensible trimming (e.g. low-quality bases and common adapters) was performed on all the samples.
Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We illustrate the use of different implementations of the hierarchical NB model by using a dataset obtained from an RNA-seq experiment Bouquet et al. [2016]. The raw data are publicly available under GEO accession number GSE63085. This is a processed dataset containing 137,078 exons with a nonzero total count across all samples, grouped into 18,765 protein-coding genes included in the analysis.
The GEO Profiles database stores gene expression profiles derived from curated GEO DataSets. Each Profile is presented as a chart that displays the expression level of one gene across all Samples within a DataSet. Experimental context is provided in the bars along the bottom of the charts making it possible to see at a glance whether a gene is differentially expressed across different experimental conditions. Profiles have various types of links including internal links that connect genes that exhibit similar behaviour, and external links to relevant records in other NCBI databases. GEO Profiles can be searched using many different attributes including keywords, gene symbols, gene names, GenBank accession numbers, or Profiles flagged as being differentially expressed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.
geo-htseq.tar.gz archive contains following files:
output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).
output/document_summaries.csv, document summaries of NCBI GEO series.
output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.
output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.
output/publications.csv, publication info of NCBI GEO series.
output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series
output/spots.csv, NCBI SRA sequencing run metadata.
output/cancer.csv, cancer related experiment accessions.
output/transcription_factor.csv, TF related experiment accessions.
output/single-cell.csv, single cell experiment accessions.
blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.
Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.
geo-htseq-updates.tar.gz archive contains files:
results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles
results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series
results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data
results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Table including information about the predictive power of genes and signatures identified from metastatic clones and patient cohorts GEO accession numbers
description: GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C. Corton, R. Ugarte, and M. Antoniou. Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents. FOOD AND CHEMICAL TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 108: 30-42, (2017).; abstract: GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C. Corton, R. Ugarte, and M. Antoniou. Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents. FOOD AND CHEMICAL TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 108: 30-42, (2017).
Gene Expression Omnibus (GEO) accession numbers of studies used in the analysis. This dataset is associated with the following publication: Rooney, J., K. Oshida, R. Kumar, W. Baldwin, and C. Corton. Chemical Activation of the Constitutive Androstane Receptor Leads to Activation of Oxidant-Induced Nrf2. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 167(1): 172-189, (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OBS! This is the limma results of the analysis. See https://doi.org/10.5281/zenodo.7032090 for the DESeq2/DEXSeq results.
This dataset contains results from paired differential expression and differential splicing analyses as well as gene-set over-representation analysis results for 199 baseline vs. case comparisons across 100 randomly curated datasets with accompanying metadata (preprint). All results were computed using the R package pairedGSEA, which utilized Limma (Ritchie et al., 2015) and fgsea (Korotkevich et al., 2019).
Each .RDS file contains a list with four objects: A 'metadata' object with the metadata of the respective raw data, a 'genes' object with gene-level differential splicing and expression results, a 'gene_set' object with over-representation results, and 'experiment' with the experiment title.
The filenames follow this pattern: "[dataset ID]_[GEO accession number]_[Manually assigned comparison title].RDS".
All datasets were obtained from a local copy of the ARCHS4 v11 database of transcript counts (Lachmann et al., 2018).
Sequences from this study are available at the NCBI GEO under accession series GSE131846 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE131846
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline
These data were derived from laboratory-maintained Trichodesmium erythraeum cultures and contain accessions and links to raw RNA-seq fastq files in NCBI’s Gene Expression Omnibus, accessible through GEO Series accession number GSE94951. The sample accession numbers corresponding to the low and high CO2 samples from this work are GSM2492342, GSM2492343, GSM2492344, and GSM2492345.
These data were reported in the paper (Lee et al., 2017)
In the present study, the Agilent-016251 Sparus aurata oligo microarray platform (GEO accession: GPL6467) was used to compare expression profiles of mineralization-induced VSa16 cell cultures against untreated ones. ECM mineralization was induced for 4 weeks by supplementing medium with 50 µg/ml of L-ascorbic acid, 10 mM β glycerophosphate and 4 mM CaCl2. For each group, total RNA was extracted from three (3) independent biological replicates, each consisting of pools of cells. Data analysis demonstrated that expression profiles were strongly affected by ECM mineralization with hundreds of genes differentially expressed with relevant fold-change. In this study, we analyzed six (6) cell samples, three (3) collected from untreated VSa16 cell cultures and three (3) collected from mineralization-induced VSa16 cell cultures. Gene expression profiling was performed using Agilent-016251 Sparus aurata oligo microarray platform (GPL6467) (6 arrays, no replicate) based on single-colour detection (Cyanine-3 only). Microarrays were scanned with Agilent scanner G2565BA (barcode on the left, DNA on the back surface, scanned through the glass) at a resolution of 5 microns; all slides were scanned twice at two different sensitivity settings (XDRHi 100% and XDRLo 10%); the scanner software created a unique ID for each pair of XDR scans and saved it to both scan image files. Feature Extraction (FE) 9.5 used XDR ID to link the pairs of scans together automatically when extracting data. The signal that was left after all the FE processing steps was ProcessedSignal that contains the Multiplicatively Detrended, Background-Subtracted Signal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SRA and NCBI accessions and sample descriptions are available in S1 Data.
PubMed Central reuse of GEO datasets deposited in 2007This is the raw data behind the analysis. It contains one row for every mention of a 2007 GEO dataset in PubMed Central. Each row identifies the mentioned GEO dataset, the PubMed Central article that mentions the dataset's accession number, whether the authors of the dataset and the attributing article overlap, and whether this is considered an instance of third-party data reuse.PMC_reuse_of_2007_GEO_datasets.csvAggregate Table DataAggregate table data behind the figures and results in the README associated with the main dataset. Includes Baseline metrics used for extrapolating PubMed Central (PMC) results to PubMed, Number of mentions of a 2007 GEO dataset by authors who submitted the dataset, and Number of mentions of a dataset by authors who DID NOT submit the dataset across 2007-2010.tables.csv Funding agencies are reluctant to support data archiving, even though large research funders such as the National Science Foundation (NSF) and the National Institutes of Health acknowledge its importance for scientific progress. Our quantitative estimates of data reuse indicate that ongoing financial investment in data-archiving infrastructure provides a high scientific return.
The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus (GEO) and are accessible through GEO Series accession number GSE204989 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE204989).
Sequence Read Archive (SRA) data, BioSamples, and GEO holdings can be accessed from the NCBI BioProject PRJNA843039 (http://www.ncbi.nlm.nih.gov/bioproject/PRJNA843039).
https://ega-archive.org/dacs/EGAC00001001989https://ega-archive.org/dacs/EGAC00001001989
The data contains single-cell gene sequencing data (10x Genomics) from FACS-purified CD8 T lymphocytes from two Austrian patients. The cells were stimulated with one MHC class I peptides obtained from a common (wild type) variant and an emerging mutant variant of the SARS-Cov-2 virus. Then the samples were multiplexed using hashtag oligos. We provide the raw and aligned sequence data for: i. The single-cell experiments ii. The PCR-amplified samples for enrichment of the hashtag oligo multiplexing barcodes iii. The PCR-amplified samples for enrichment of the T Cell Receptor (TCR) VDJ region for immuno-profiling. The samples and libraries were processed and obtained in collaboration between St. Anna Children's Cancer Research Institute (CCRI), CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, and the Medical University of Vienna. The cell barcodes and processed data has been submitted to the GEO database with GEO accession GSE166651.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This repository contains metadata and single-cell data used to generate figures in the manuscript entitled: "Post-infusion Treg-like CAR T cells identify patients resistant to CD19-CAR therapy". Included here: CSV files containing patient cohort metadata, summary statistics and quantitative PCR results; FCS files for flow and mass cytometry data; processed Seurat object for single-cell sequencing data. Raw single-cell sequencing data, cellranger alignment results, and metadata are available through the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo; GEO accession number: GSE168940). With questions, please reach out to Zinaida Good (zinaida@stanford.edu) or Crystal L. Mackall (cmackall@stanford.edu).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-seq data using a de novo transcriptome reconstruction strategy from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, RNA-seq libraries were constructed from multiple mouse cell types including G1E - a GATA-null immortalized cell line derived from targeted disruption of GATA-1 in mouse embryonic stem cells - and megakaryocytes. This RNA-seq data was used to determine differential gene expression between G1E and megakaryocytes and later correlated with Tal1 occupancy. This dataset (GEO Accession: GSE51338) consists of biological replicate, paired-end, polyA selected RNA-seq libraries. Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to a subset of interesting genomic loci identified by Wu et al. This dataset represents an even smaller set of data than another training data set (DOI:10.5281/zenodo.254485).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Genomic imprinting results in parent-of-origin dependent gene expression biased towards either the maternally- or paternally-derived allele at the imprinted locus. The kinship theory of genomic imprinting argues that this unusual expression pattern is a manifestation of intra-genomic conflict between the maternally- and paternally-derived halves of the genome that arises because they are not equally related to the genomes of social partners. The theory thus predicts that imprinting may evolve wherever there are close interactions among asymmetrically related kin. The social Hymenoptera with permanent caste differentiation are suitable candidates for testing the kinship theory because haplodiploid sex determination creates strong relatedness asymmetries and nursing workers interact closely with kin. However, progress in the search for imprinted genes in the social Hymenoptera has been slow, in part because tests for imprinting rely on reciprocal crosses that are impossible in most species. Here, we develop a method to systematically search for imprinting in haplodiploid social insects without crosses, using instead samples of pooled individuals collected from natural colonies. We tested this protocol using data available for the leaf-cutting ant Acromyrmex echinatior, providing the first genome-wide search for imprinting in any ant. While we identified several genes as potentially imprinted, none of the four genes tested could be verified as imprinted using digital droplet PCR, highlighting the need for higher quality genomic assemblies that accurately map duplicated genes.
Methods Detailed information regarding data collection can be found in the manuscript, and a detailed description of each data file is included in the README.txt, but are summarised briefly here:
ASE_data_frame.csv: gives the output of the bioinformatics pipeline run by Qiye Li and Zongji Wang. In short, they used the data published in Li et al 2014, aligned reads using a Burrows-Wheeler aligner, followed by BLAT. SNPs were then identified using RES-Scanner in both DNA and RNA, and the numbers of reads supporting each SNP-allele recorded. Fisher's exact tests were conducted for each SNP to test for differences in the ratio of alleles between DNA and RNA at the same locus. This file records those SNPs that showed significant differences between DNA and RNA, and gives the number of reads supporting each allele for each sample, as well as information for each gene. SNPs that could not be placed within an annotated gene were not included. The raw data can be downloaded with the original article at the NCBI GEO accession GSE51576. For questions regarding the bioinformatics steps preceeding this table, contact QL or ZW.
Compiled_ddPCR_results.csv: gives the data output from the ddPCR experiments. Including the number of droplets that were positive for both alleles, positive for only one allele, and negative droplets. This was used to calculate the relative proportion of the focal allele, and the poisson distributed confidence intervals around this estimate are also given (as templates will show a poisson distribution among droplets).
dilution_data.csv: also gives ddPCR data. This data shows the output from the tests of the relative concentration of different genes following serial dilution. The Sample column gives the relative dilution levels (1 to 0.016)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.