Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides an intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression.
This is a subset of the total gene expression contained within ARCHS4. Specifically, this data only contains samples matching human liver samples. The dataset contains 903 unique samples from 60 distinct experiments created by a diverse group of researchers. The data is provided as a simple tab-separated file with the columns representing the samples and the rows are 35238 genes encoded as HUGO gene symbols.
This is a good example of high dimensional data. It can be used to test visualizations techniques as well as batch effect detection and removal.
Facebook
TwitterRemark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6
The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.
Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.
The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.
There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.
The ARCHS4 project is by :
'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top pro-longevity and anti-longevity genes not in GenAge predicted using GO terms and ARCHS4 gene expression for worm and yeast with the pglm (GLM-Net) algorithm.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
`orthosData` is the companion database to the `orthos` software for mechanistic studies using differential gene expression experiments.
It currently encompasses data for over 100,000 differential gene expression mouse and human experiments distilled and compiled from the ARCHS4 database* as well as associated pre-trained variational models.
Together with `orthos` it was developed to provide a better understanding of the effects of experimental treatments on gene expression and to help map treatments to mechanisms of action.
Facebook
TwitterBackgroundChronic pressure overload triggers pathological cardiac hypertrophy that eventually leads to heart failure. Effective biomarkers and therapeutic targets for heart failure remain to be defined. The aim of this study is to identify key genes associated with pathological cardiac hypertrophy by combining bioinformatics analyses with molecular biology experiments.MethodsComprehensive bioinformatics tools were used to screen genes related to pressure overload-induced cardiac hypertrophy. We identified differentially expressed genes (DEGs) by overlapping three Gene Expression Omnibus (GEO) datasets (GSE5500, GSE1621, and GSE36074). Correlation analysis and BioGPS online tool were used to detect the genes of interest. A mouse model of cardiac remodeling induced by transverse aortic constriction (TAC) was established to verify the expression of the interest gene during cardiac remodeling by RT-PCR and western blot. By using RNA interference technology, the effect of transcription elongation factor A3 (Tcea3) silencing on PE-induced hypertrophy of neonatal rat ventricular myocytes (NRVMs) was detected. Next, gene set enrichment analysis (GSEA) and the online tool ARCHS4 were used to predict the possible signaling pathways, and the fatty acid oxidation relevant pathways were enriched and then verified in NRVMs. Furthermore, the changes of long-chain fatty acid respiration in NRVMs were detected using the Seahorse XFe24 Analyzer. Finally, MitoSOX staining was used to detect the effect of Tcea3 on mitochondrial oxidative stress, and the contents of NADP(H) and GSH/GSSG were detected by relevant kits.ResultsA total of 95 DEGs were identified and Tcea3 was negatively correlated with Nppa, Nppb and Myh7. The expression level of Tcea3 was downregulated during cardiac remodeling both in vivo and in vitro. Knockdown of Tcea3 aggravated cardiomyocyte hypertrophy induced by PE in NRVMs. GSEA and online tool ARCHS4 predict Tcea3 involved in fatty acid oxidation (FAO). Subsequently, RT-PCR results showed that knockdown of Tcea3 up-regulated Ces1d and Pla2g5 mRNA expression levels. In PE induced cardiomyocyte hypertrophy, Tcea3 silencing results in decreased fatty acid utilization, decreased ATP synthesis and increased mitochondrial oxidative stress.ConclusionOur study identifies Tcea3 as a novel anti-cardiac remodeling target by regulating FAO and governing mitochondrial oxidative stress.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Table S1. Summary of differentially expressed genes across conditions in CFBE cells. Summary of up- and down-regulated differentially expressed genes in conditions vs. controls. Table S2. Differentially expressed genes in the miR-138/SIN3A condition. List of differentially expressed genes in the miR-138/SIN3A conditions vs. scrambled siRNA control. Table S3. Differentially expressed genes in the NEDD8/SYVN1 condition. List of differentially expressed genes in the NEDD8/SYVN1 conditions vs. scrambled siRNA control. Table S4. Differentially expressed genes in the temperature condition. List of differentially expressed genes in the temperature conditions vs. 37°C control. Table S5. CFTR interactome used as seed nodes. List of CFTR effectors and interactors used as seed nodes in the M-module analysis. Table S6. DsiRNA and primer sequences. List of siRNA and primer sequences used in the functional knockdown experiments to test for CFTR rescue. Table S7. Untested non-seed module genes. List of genes resulting from the M-module analysis that have not been previously tested or linked to CFTR. Table S8. Top 50 predicted gene ontology biological processes for CHURC1. List of biological processes associated with CHURC1 according to the ARChS4 software. Table S9. Top 50 predicted gene ontology biological processes for RPL15. List of biological processes associated with RPL15 according to the ARChS4 software. Table S10. Top 50 predicted gene ontology biological processes for GZF1. List of biological processes associated with GZF1 according to the ARChS4 software. Figure S1. Schematic showing intersection of differentially expressed genes across conditions. Controls and conditions are described in Table 1. Up arrows indicate up-regulated genes; down arrows indicate down-regulated genes. Significance is defined as FDR < 0.05. Figure S2. Representative transepithelial current tracings demonstrating the effects of individual gene knockdown on CFTR-dependent chloride current in CFBE cells. The Y-axis represents transepithelial current in µA and the X-axis represents time in seconds. The addition of the cAMP agonists forskolin and IBMX resulted in an increase in CFTR-dependent transepithelial chloride current in cells treated with DsiRNAs targeting: A) CHURC1 or B) RPL15. This increase in current was inhibited by the CFTR channel inhibitor GlyH-101. The tracing shown in C demonstrates that DsiRNA knockdown of THOC7 was ineffective in restoring CFTR-dependent chloride current.
Facebook
Twitteraging-expressions: Age-Stratified Gene Expression Dataset
Dataset Description
This dataset provides age-stratified gene expression data derived from ARCHS4 and GTEx databases, specifically curated for fine-tuning BulkFormer and other bulk RNA-seq deep learning models. The dataset contains TPM-normalized expression values for protein-coding genes, enriched with comprehensive demographic metadata including precise age information. Key Features:
🎯 Optimized for BulkFormer:… See the full description on the dataset page: https://huggingface.co/datasets/longevity-genie/aging-expressions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary materials for the research paper "Association of copy number alterations with the immune transcriptomic landscape in cancer". The materials are organized in the following folders:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OBS! This is the limma results of the analysis. See https://doi.org/10.5281/zenodo.7032090 for the DESeq2/DEXSeq results.
This dataset contains results from paired differential expression and differential splicing analyses as well as gene-set over-representation analysis results for 199 baseline vs. case comparisons across 100 randomly curated datasets with accompanying metadata (preprint). All results were computed using the R package pairedGSEA, which utilized Limma (Ritchie et al., 2015) and fgsea (Korotkevich et al., 2019).
Each .RDS file contains a list with four objects: A 'metadata' object with the metadata of the respective raw data, a 'genes' object with gene-level differential splicing and expression results, a 'gene_set' object with over-representation results, and 'experiment' with the experiment title.
The filenames follow this pattern: "[dataset ID]_[GEO accession number]_[Manually assigned comparison title].RDS".
All datasets were obtained from a local copy of the ARCHS4 v11 database of transcript counts (Lachmann et al., 2018).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Failure to adequately characterize cell lines, and understand the differences between in vitro and in vivo biology, can have serious consequences on the translatability of in vitro scientific studies to human clinical trials. This project focuses on the Michigan Cancer Foundation-7 (MCF-7) cells, a human breast adenocarcinoma cell line that is commonly used for in vitro cancer research, with over 42,000 publications in PubMed. In this study, we explore the key similarities and differences in gene expression networks of MCF-7 cell lines compared to human breast cancer tissues. We used two MCF-7 data sets, one data set collected by ARCHS4 including 1032 samples and one data set from Gene Expression Omnibus GSE50705 with 88 estradiol-treated MCF-7 samples. The human breast invasive ductal carcinoma (BRCA) data set came from The Cancer Genome Atlas, including 1212 breast tissue samples. Weighted Gene Correlation Network Analysis (WGCNA) and functional annotations of the data showed that MCF-7 cells and human breast tissues have only minimal similarity in biological processes, although some fundamental functions, such as cell cycle, are conserved. Scaled connectivity—a network topology metric—also showed drastic differences in the behavior of genes between MCF-7 and BRCA data sets. Finally, we used canSAR to compute ligand-based druggability scores of genes in the data sets, and our results suggested that using MCF-7 to study breast cancer may lead to missing important gene targets. Our comparison of the networks of MCF-7 and human breast cancer highlights the nuances of using MCF-7 to study human breast cancer and can contribute to better experimental design and result interpretation of study involving this cell line.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2: Table S1. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE94023 study. Table S2. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE99626 study. Table S3. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE67295 study. Table S4. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE115607 study. Table S5. Differentially bound sites (DBSs) obtained from T47D cell line treated with 10nM E2 for 45 minutes in GSE80367 study. Table S6. Differentially bound sites (DBSs) obtained from T47D cell line treated with 100nM E2 for 45 minutes in GSE23893 study. Table S7. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE23893 study. Table S8. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE54855 study. Table S9. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE59530 study. Table S10. Default binding affinity matrix of 6 samples by the 63,612 sites that overlap in at least two of the samples using DiffBind in (GSE94023, GSE99626, GSE67295, & GSE115607) MCF7 cell line treated with 10nM E2 for 45 minutes. Table S11. Default binding affinity matrix of 6 samples by the 23,517 sites that overlap in at least two of the samples using DiffBind in (GSE23893, GSE54855, & GSE59530) MCF7 cell line treated with 100nM E2 for 45 minutes. Table S12. Meta-differentially bound sites (meta-DBSs) obtained from a meta-analysis on (GSE94023, GSE99626, GSE67295, & GSE115607) MCF7 cell line treated with 10nM E2 for 45 minutes. Table S13. Meta-differentially bound sites (meta-DBSs) obtained from a meta-analysis on (GSE23893, GSE54855, & GSE59530) MCF7 cell line treated with 100nM E2 for 45 minutes. Table S14. literature_ChIP-seq. Table S15. Enrichr. Table S16. ARCHS4—Coexpression. Table S17. ENCODE--ChIP-seq. Table S18. ReMap--ChIP-seq. Table S19. GTEx—Coexpression. Table S20. Integrated_topRank. Table S21. Integrated_meanRank. Table S22. Gene Ontology (GO) for 7,308 meta-DBSs related to 617 common genes among MCF7 & T47D cell lines using Cistrome-GO. Table S23. KEGG pathways analysis for 7,308 meta-DBSs related to 617 common genes among MCF7 & T47D cell lines using Cistrome-GO. Table S24. Differentially expressed genes (DEGs) identified from GRO-seq data in the MCF7 cell line treated with 100nM E2 for 40 minutes in the GSE27463 study.
Facebook
TwitterAnatomical structure and cell type biomarker annotations from the HuBMAP ASCT+B tables, augmented with RNA-seq coexpression data from ARCHS4
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides an intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression.
This is a subset of the total gene expression contained within ARCHS4. Specifically, this data only contains samples matching human liver samples. The dataset contains 903 unique samples from 60 distinct experiments created by a diverse group of researchers. The data is provided as a simple tab-separated file with the columns representing the samples and the rows are 35238 genes encoded as HUGO gene symbols.
This is a good example of high dimensional data. It can be used to test visualizations techniques as well as batch effect detection and removal.