12 datasets found

Human liver RNA-Seq gene expression (903 samples)
kaggle.com
zip
Updated Oct 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Lachmann (2019). Human liver RNA-Seq gene expression (903 samples) [Dataset]. https://www.kaggle.com/dsv/758537
Explore at:
zip(20335725 bytes)Available download formats
Dataset updated
Oct 23, 2019
Authors
Alexander Lachmann
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides an intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression.

Content

This is a subset of the total gene expression contained within ARCHS4. Specifically, this data only contains samples matching human liver samples. The dataset contains 903 unique samples from 60 distinct experiments created by a diverse group of researchers. The data is provided as a simple tab-separated file with the columns representing the samples and the rows are 35238 genes encoded as HUGO gene symbols.

Inspiration

This is a good example of high dimensional data. It can be used to test visualizations techniques as well as batch effect detection and removal.
Multiple Single Cell RNA Expressions ARCHS4
kaggle.com
zip
Updated Jul 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
Explore at:
zip(23319014182 bytes)Available download formats
Dataset updated
Jul 25, 2021
Authors
Alexander Chervov
Description
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Context

Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

Content

The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

Acknowledgements

The ARCHS4 project is by :

'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'
Top pro-longevity and anti-longevity genes not in GenAge predicted using GO...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. William Townes; Kareem Carr; Jeffrey W. Miller (2023). Top pro-longevity and anti-longevity genes not in GenAge predicted using GO terms and ARCHS4 gene expression for worm and yeast with the pglm (GLM-Net) algorithm. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008429.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1008429.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
F. William Townes; Kareem Carr; Jeffrey W. Miller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Top pro-longevity and anti-longevity genes not in GenAge predicted using GO terms and ARCHS4 gene expression for worm and yeast with the pglm (GLM-Net) algorithm.
orthosData
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panagiotis Papasaikas; Panagiotis Papasaikas; Charlotte Soneson; Charlotte Soneson; Michael Stadler; Michael Stadler (2023). orthosData [Dataset]. http://doi.org/10.5281/zenodo.7908269
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7908269
Dataset updated
May 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Panagiotis Papasaikas; Panagiotis Papasaikas; Charlotte Soneson; Charlotte Soneson; Michael Stadler; Michael Stadler
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
`orthosData` is the companion database to the `orthos` software for mechanistic studies using differential gene expression experiments.

It currently encompasses data for over 100,000 differential gene expression mouse and human experiments distilled and compiled from the ARCHS4 database* as well as associated pre-trained variational models.

Together with `orthos` it was developed to provide a better understanding of the effects of experimental treatments on gene expression and to help map treatments to mechanisms of action.
f
Table1_Identify Tcea3 as a novel anti-cardiomyocyte hypertrophy gene...
datasetcatalog.nlm.nih.gov
Updated Jun 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, Ya-jie; Guo, Yingying; Zhang, Meng; Li, Dan; Xu, Man; Xia, Hao; Qiu, Hong-liang; Huang, Si-hui; Cen, Xian-feng (2023). Table1_Identify Tcea3 as a novel anti-cardiomyocyte hypertrophy gene involved in fatty acid oxidation and oxidative stress.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001111766
Explore at:
Dataset updated
Jun 19, 2023
Authors
Chen, Ya-jie; Guo, Yingying; Zhang, Meng; Li, Dan; Xu, Man; Xia, Hao; Qiu, Hong-liang; Huang, Si-hui; Cen, Xian-feng
Description
BackgroundChronic pressure overload triggers pathological cardiac hypertrophy that eventually leads to heart failure. Effective biomarkers and therapeutic targets for heart failure remain to be defined. The aim of this study is to identify key genes associated with pathological cardiac hypertrophy by combining bioinformatics analyses with molecular biology experiments.MethodsComprehensive bioinformatics tools were used to screen genes related to pressure overload-induced cardiac hypertrophy. We identified differentially expressed genes (DEGs) by overlapping three Gene Expression Omnibus (GEO) datasets (GSE5500, GSE1621, and GSE36074). Correlation analysis and BioGPS online tool were used to detect the genes of interest. A mouse model of cardiac remodeling induced by transverse aortic constriction (TAC) was established to verify the expression of the interest gene during cardiac remodeling by RT-PCR and western blot. By using RNA interference technology, the effect of transcription elongation factor A3 (Tcea3) silencing on PE-induced hypertrophy of neonatal rat ventricular myocytes (NRVMs) was detected. Next, gene set enrichment analysis (GSEA) and the online tool ARCHS4 were used to predict the possible signaling pathways, and the fatty acid oxidation relevant pathways were enriched and then verified in NRVMs. Furthermore, the changes of long-chain fatty acid respiration in NRVMs were detected using the Seahorse XFe24 Analyzer. Finally, MitoSOX staining was used to detect the effect of Tcea3 on mitochondrial oxidative stress, and the contents of NADP(H) and GSH/GSSG were detected by relevant kits.ResultsA total of 95 DEGs were identified and Tcea3 was negatively correlated with Nppa, Nppb and Myh7. The expression level of Tcea3 was downregulated during cardiac remodeling both in vivo and in vitro. Knockdown of Tcea3 aggravated cardiomyocyte hypertrophy induced by PE in NRVMs. GSEA and online tool ARCHS4 predict Tcea3 involved in fatty acid oxidation (FAO). Subsequently, RT-PCR results showed that knockdown of Tcea3 up-regulated Ces1d and Pla2g5 mRNA expression levels. In PE induced cardiomyocyte hypertrophy, Tcea3 silencing results in decreased fatty acid utilization, decreased ATP synthesis and increased mitochondrial oxidative stress.ConclusionOur study identifies Tcea3 as a novel anti-cardiac remodeling target by regulating FAO and governing mitochondrial oxidative stress.
Additional file 1 of Analysis of multiple gene co-expression networks to...
springernature.figshare.com
xlsx
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew D. Strub; Long Gao; Kai Tan; Paul B. McCray (2023). Additional file 1 of Analysis of multiple gene co-expression networks to discover interactions favoring CFTR biogenesis and ΔF508-CFTR rescue [Dataset]. http://doi.org/10.6084/m9.figshare.16909890.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16909890.v1
Dataset updated
Jun 7, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Matthew D. Strub; Long Gao; Kai Tan; Paul B. McCray
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1. Table S1. Summary of differentially expressed genes across conditions in CFBE cells. Summary of up- and down-regulated differentially expressed genes in conditions vs. controls. Table S2. Differentially expressed genes in the miR-138/SIN3A condition. List of differentially expressed genes in the miR-138/SIN3A conditions vs. scrambled siRNA control. Table S3. Differentially expressed genes in the NEDD8/SYVN1 condition. List of differentially expressed genes in the NEDD8/SYVN1 conditions vs. scrambled siRNA control. Table S4. Differentially expressed genes in the temperature condition. List of differentially expressed genes in the temperature conditions vs. 37°C control. Table S5. CFTR interactome used as seed nodes. List of CFTR effectors and interactors used as seed nodes in the M-module analysis. Table S6. DsiRNA and primer sequences. List of siRNA and primer sequences used in the functional knockdown experiments to test for CFTR rescue. Table S7. Untested non-seed module genes. List of genes resulting from the M-module analysis that have not been previously tested or linked to CFTR. Table S8. Top 50 predicted gene ontology biological processes for CHURC1. List of biological processes associated with CHURC1 according to the ARChS4 software. Table S9. Top 50 predicted gene ontology biological processes for RPL15. List of biological processes associated with RPL15 according to the ARChS4 software. Table S10. Top 50 predicted gene ontology biological processes for GZF1. List of biological processes associated with GZF1 according to the ARChS4 software. Figure S1. Schematic showing intersection of differentially expressed genes across conditions. Controls and conditions are described in Table 1. Up arrows indicate up-regulated genes; down arrows indicate down-regulated genes. Significance is defined as FDR < 0.05. Figure S2. Representative transepithelial current tracings demonstrating the effects of individual gene knockdown on CFTR-dependent chloride current in CFBE cells. The Y-axis represents transepithelial current in µA and the X-axis represents time in seconds. The addition of the cAMP agonists forskolin and IBMX resulted in an increase in CFTR-dependent transepithelial chloride current in cells treated with DsiRNAs targeting: A) CHURC1 or B) RPL15. This increase in current was inhibited by the CFTR channel inhibitor GlyH-101. The tracing shown in C demonstrates that DsiRNA knockdown of THOC7 was ineffective in restoring CFTR-dependent chloride current.
h
aging-expressions
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Longevity Genie, aging-expressions [Dataset]. https://huggingface.co/datasets/longevity-genie/aging-expressions
Explore at:
Dataset authored and provided by
Longevity Genie
Description
aging-expressions: Age-Stratified Gene Expression Dataset

Dataset Description

This dataset provides age-stratified gene expression data derived from ARCHS4 and GTEx databases, specifically curated for fine-tuning BulkFormer and other bulk RNA-seq deep learning models. The dataset contains TPM-normalized expression values for protein-coding genes, enriched with comprehensive demographic metadata including precise age information. Key Features:

🎯 Optimized for BulkFormer:… See the full description on the dataset page: https://huggingface.co/datasets/longevity-genie/aging-expressions.
Data from: Association of copy number alterations with the immune...
zenodo.org
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S Loipfinger; S Loipfinger; A Bhattacharya; A Bhattacharya; RSN Fehrmann; RSN Fehrmann (2025). Association of copy number alterations with the immune transcriptomic landscape in cancer [Dataset]. http://doi.org/10.5281/zenodo.13983463
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13983463
Dataset updated
Jun 1, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
S Loipfinger; S Loipfinger; A Bhattacharya; A Bhattacharya; RSN Fehrmann; RSN Fehrmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary materials for the research paper "Association of copy number alterations with the immune transcriptomic landscape in cancer". The materials are organized in the following folders:

00_code: code to reproduce the analysis

01_ica_datasets: transcriptional components, sample mixing matrix activities, and gene set enrichment analysis results of the GPL570, ARCHS4, and TCGA datasets

02_ica_cna_tc: identified CNA-TCs and their captured CNA regions, genomic plots of CNA-TCs

03_ica_immune_tc: identified immune-TCs, list of immune gene sets

04_ica_tc_dataset_overlap: reproducibility results of CNA-TCs and immune-TCs across datasets

05_immune_gene_occurences: frequency of genes with high gene weight in immune-TCs, list potential novel immune involved ORFs

06_projection_immune_tc_datasets: GPL570 and TCGA cancer samples corrected mixing matrix activitiy for CNA-TCs and immune-TCs of the other dataset

07_inferred_cna_profiles: TACNA profiles and CNA burden per cancer sample

08_cna_burden_immune_tc_association: cancer sample associations of CNA burden, individual CNAs, and single gene CNAs with immune-TCs

09_projection_single_cell: cell type activity of immune-TCs in a single-cell tumor immune atlas for precision oncology

10_projection_spatial_transcriptomics: activity of CNA-TC and immune-TC for each spot in spatial transcriptomic datasets from 10xGenomics
Z
Paired differential gene expression and splicing analyses results of 199...
nde-dev.biothings.io
data-staging.niaid.nih.gov
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristoffer Vitting-Seerup (2023). Paired differential gene expression and splicing analyses results of 199 baseline vs. case comparisons across 100 datasets (Limma) [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7032089
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
Søren Helweg Dam
Kristoffer Vitting-Seerup
Lars Rønn Olsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OBS! This is the limma results of the analysis. See https://doi.org/10.5281/zenodo.7032090 for the DESeq2/DEXSeq results.

This dataset contains results from paired differential expression and differential splicing analyses as well as gene-set over-representation analysis results for 199 baseline vs. case comparisons across 100 randomly curated datasets with accompanying metadata (preprint). All results were computed using the R package pairedGSEA, which utilized Limma (Ritchie et al., 2015) and fgsea (Korotkevich et al., 2019).

Each .RDS file contains a list with four objects: A 'metadata' object with the metadata of the respective raw data, a 'genes' object with gene-level differential splicing and expression results, a 'gene_set' object with over-representation results, and 'experiment' with the experiment title.

The filenames follow this pattern: "[dataset ID]_[GEO accession number]_[Manually assigned comparison title].RDS".

All datasets were obtained from a local copy of the ARCHS4 v11 database of transcript counts (Lachmann et al., 2018).
DataSheet1_Similarities and Differences in Gene Expression Networks Between...
frontiersin.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Tran; Robert Kim; Mikhail Maertens; Thomas Hartung; Alexandra Maertens (2023). DataSheet1_Similarities and Differences in Gene Expression Networks Between the Breast Cancer Cell Line Michigan Cancer Foundation-7 and Invasive Human Breast Cancer Tissues.zip [Dataset]. http://doi.org/10.3389/frai.2021.674370.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2021.674370.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Vy Tran; Robert Kim; Mikhail Maertens; Thomas Hartung; Alexandra Maertens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Failure to adequately characterize cell lines, and understand the differences between in vitro and in vivo biology, can have serious consequences on the translatability of in vitro scientific studies to human clinical trials. This project focuses on the Michigan Cancer Foundation-7 (MCF-7) cells, a human breast adenocarcinoma cell line that is commonly used for in vitro cancer research, with over 42,000 publications in PubMed. In this study, we explore the key similarities and differences in gene expression networks of MCF-7 cell lines compared to human breast cancer tissues. We used two MCF-7 data sets, one data set collected by ARCHS4 including 1032 samples and one data set from Gene Expression Omnibus GSE50705 with 88 estradiol-treated MCF-7 samples. The human breast invasive ductal carcinoma (BRCA) data set came from The Cancer Genome Atlas, including 1212 breast tissue samples. Weighted Gene Correlation Network Analysis (WGCNA) and functional annotations of the data showed that MCF-7 cells and human breast tissues have only minimal similarity in biological processes, although some fundamental functions, such as cell cycle, are conserved. Scaled connectivity—a network topology metric—also showed drastic differences in the behavior of genes between MCF-7 and BRCA data sets. Finally, we used canSAR to compute ligand-based druggability scores of genes in the data sets, and our results suggested that using MCF-7 to study breast cancer may lead to missing important gene targets. Our comparison of the networks of MCF-7 and human breast cancer highlights the nuances of using MCF-7 to study human breast cancer and can contribute to better experimental design and result interpretation of study involving this cell line.
Additional file 2 of Meta-analysis of integrated ChIP-seq and transcriptome...
springernature.figshare.com
xlsx
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynab Piryaei; Zahra Salehi; Esmaeil Ebrahimie; Mansour Ebrahimi; Kaveh Kavousi (2024). Additional file 2 of Meta-analysis of integrated ChIP-seq and transcriptome data revealed genomic regions affected by estrogen receptor alpha in breast cancer [Dataset]. http://doi.org/10.6084/m9.figshare.26617588.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26617588.v1
Dataset updated
Aug 14, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Zeynab Piryaei; Zahra Salehi; Esmaeil Ebrahimie; Mansour Ebrahimi; Kaveh Kavousi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S1. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE94023 study. Table S2. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE99626 study. Table S3. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE67295 study. Table S4. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 10nM E2 for 45 minutes in GSE115607 study. Table S5. Differentially bound sites (DBSs) obtained from T47D cell line treated with 10nM E2 for 45 minutes in GSE80367 study. Table S6. Differentially bound sites (DBSs) obtained from T47D cell line treated with 100nM E2 for 45 minutes in GSE23893 study. Table S7. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE23893 study. Table S8. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE54855 study. Table S9. Differentially bound sites (DBSs) obtained from MCF7 cell line treated with 100nM E2 for 45 minutes in GSE59530 study. Table S10. Default binding affinity matrix of 6 samples by the 63,612 sites that overlap in at least two of the samples using DiffBind in (GSE94023, GSE99626, GSE67295, & GSE115607) MCF7 cell line treated with 10nM E2 for 45 minutes. Table S11. Default binding affinity matrix of 6 samples by the 23,517 sites that overlap in at least two of the samples using DiffBind in (GSE23893, GSE54855, & GSE59530) MCF7 cell line treated with 100nM E2 for 45 minutes. Table S12. Meta-differentially bound sites (meta-DBSs) obtained from a meta-analysis on (GSE94023, GSE99626, GSE67295, & GSE115607) MCF7 cell line treated with 10nM E2 for 45 minutes. Table S13. Meta-differentially bound sites (meta-DBSs) obtained from a meta-analysis on (GSE23893, GSE54855, & GSE59530) MCF7 cell line treated with 100nM E2 for 45 minutes. Table S14. literature_ChIP-seq. Table S15. Enrichr. Table S16. ARCHS4—Coexpression. Table S17. ENCODE--ChIP-seq. Table S18. ReMap--ChIP-seq. Table S19. GTEx—Coexpression. Table S20. Integrated_topRank. Table S21. Integrated_meanRank. Table S22. Gene Ontology (GO) for 7,308 meta-DBSs related to 617 common genes among MCF7 & T47D cell lines using Cistrome-GO. Table S23. KEGG pathways analysis for 7,308 meta-DBSs related to 617 common genes among MCF7 & T47D cell lines using Cistrome-GO. Table S24. Differentially expressed genes (DEGs) identified from GRO-seq data in the MCF7 cell line treated with 100nM E2 for 40 minutes in the GSE27463 study.
m
HuBMAP ASCT+B Augmented with RNA-seq Coexpression
maayanlab.cloud
gz
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma'ayan Laboratory of Computational Systems Biology (2025). HuBMAP ASCT+B Augmented with RNA-seq Coexpression [Dataset]. https://maayanlab.cloud/Harmonizome/dataset/HuBMAP+ASCT%24plus%24B+Augmented+with+RNA-seq+Coexpression
Explore at:
gzAvailable download formats
Dataset updated
Jan 29, 2025
Dataset provided by
Harmonizome
Ma'ayan Laboratory of Computational Systems Biology
Authors
Ma'ayan Laboratory of Computational Systems Biology
Description
Anatomical structure and cell type biomarker annotations from the HuBMAP ASCT+B tables, augmented with RNA-seq coexpression data from ARCHS4
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexander Lachmann (2019). Human liver RNA-Seq gene expression (903 samples) [Dataset]. https://www.kaggle.com/dsv/758537

Human liver RNA-Seq gene expression (903 samples)

ARCHS4 gene expression from human liver tissue

Explore at:

zip(20335725 bytes)Available download formats

Dataset updated

Oct 23, 2019

Authors

Alexander Lachmann

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides an intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression.

Content

This is a subset of the total gene expression contained within ARCHS4. Specifically, this data only contains samples matching human liver samples. The dataset contains 903 unique samples from 60 distinct experiments created by a diverse group of researchers. The data is provided as a simple tab-separated file with the columns representing the samples and the rows are 35238 genes encoded as HUGO gene symbols.

Inspiration

This is a good example of high dimensional data. It can be used to test visualizations techniques as well as batch effect detection and removal.

Clear search

Close search

Google apps

Main menu

Human liver RNA-Seq gene expression (903 samples)

Context

Content

Inspiration

Multiple Single Cell RNA Expressions ARCHS4

Context

Content

Acknowledgements

Top pro-longevity and anti-longevity genes not in GenAge predicted using GO...

orthosData

Table1_Identify Tcea3 as a novel anti-cardiomyocyte hypertrophy gene...

Additional file 1 of Analysis of multiple gene co-expression networks to...

aging-expressions

Data from: Association of copy number alterations with the immune...

Paired differential gene expression and splicing analyses results of 199...

DataSheet1_Similarities and Differences in Gene Expression Networks Between...

Additional file 2 of Meta-analysis of integrated ChIP-seq and transcriptome...

HuBMAP ASCT+B Augmented with RNA-seq Coexpression

Human liver RNA-Seq gene expression (903 samples)

ARCHS4 gene expression from human liver tissue

Context

Content

Inspiration