Facebook
TwitterGWHed/geoquery dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background
RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.
Results
With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to aid in cluster number selection, Mfuzz loop computations, cluster overlap analysis, as well as cluster enrichments.
Conclusion
RNfuzzyApp is an intuitive, easy to use and interactive R shiny app for RNA-seq differential expression and time-series analysis, offering a rich selection of interactive plots, providing a quick overview of raw data and generating rapid analysis results. Furthermore, its orthology assignment, enrichment analysis, as well as ID conversion functions are accessible to non-model organisms.
Methods Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt: mean values calculated from raw reads of replicates, downloaded from gene expression omnibus (dataset GSE143430 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE143430).
Haering_etal_extendedDatatable_1a_Tabulamurissenis_3vs12m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1b_Tabulamurissenis_3vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1c_Tabulamurissenis_12vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1d_Tabulamurissenis_3vs12m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1e_Tabulamurissenis_3vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1f_Tabulamurissenis_12vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2a_Tabulamurissenis_cluster1_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2b_Tabulamurissenis_cluster2_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2c_Tabulamurissenis_cluster3_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2d_Tabulamurissenis_cluster4_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2e_Tabulamurissenis_cluster5_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3a_DmLeg_cluster1_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3b_DmLeg_cluster2_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3c_DmLeg_cluster3_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3d_DmLeg_cluster4_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3e_DmLeg_cluster5_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3f_DmLeg_cluster6_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3g_DmLeg_cluster7_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3h_DmLeg_cluster8_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3i_DmLeg_cluster9_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3j_DmLeg_cluster10_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3k_DmLeg_cluster11_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3l_DmLeg_cluster12_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Facebook
TwitterCollection of gene expression and similar datasets related to brain tumors. In particular Medulloblastoma. Medulloblastoma is the most common malignant brain tumor in childhood. Typically csv files genes x samples.
GSE124814 WOW! Integration of many (all?) medulloblastoma datasets(!): 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124814 Weishaupt H, Johansson P, Sundström A, Lubovac-Pilav Z et al. Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes. Bioinformatics 2019 Sep 15;35(18):3357-3364. PMID: 30715209 https://doi.org/10.1093/bioinformatics/btz066 We downloaded a total of 1796 CEL files from previously published GEO or ArrayExpress records: GSE85217(n=763), GSE25219(n=154), GSE60862(n=130), GSE12992(n=40), GSE67850(n=22), GSE10327(n=62), GSE30074(n=30), E-MTAB-292(n=19), GSE74195(n=30), GSE37418(n=76), GSE4036(n=14), GSE62803(n=52), GSE21140(n=103), GSE37382(n=50), GSE22569(n=24), GSE35974(n=50), GSE73038(n=46), GSE50161(n=24), GSE3526(n=9), GSE50765(n=12), GSE49243(n=58), GSE41842(n=19), GSE44971(n=9). After preprocessing of all CEL files, we averaged the expression profiles of samples that mapped to the same patient in a single dataset, producing a final expression array comprising 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain (cerebellum/upper rhombic lip). Also discussed in paper: A transcriptome-based classifier to determine molecular subtypes in medulloblastoma https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008263
GSE85217 (Cavalli ... Taylor ) 768 samples 2016 ( Affimetrix Human Gene 1.1 ST Array ) Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654 Ramaswamy V, Taylor MD. Bioinformatic Strategies for the Genomic and Epigenomic Characterization of Brain Tumors. Methods Mol Biol 2019;1869:37-56. PMID: 30324512 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85217
GSE202043 (Pomeroy) 214 samples, 2011 (Expression profiling by array) Cho YJ, Tsherniak A, Tamayo P, Santagata S et al. Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J Clin Oncol 2011 Apr 10;29(11):1424-30. PMID: 21098324 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202043
GSE12992 (Fattet ... Delattre) 72 samples, 2009 (Expression profiling by array) Fattet S, Haberler C, Legoix P, Varlet P et al. Beta-catenin status in paediatric medulloblastomas: correlation of immunohistochemical expression with mutational status, genetic profiles, and clinical characteristics. J Pathol 2009 May;218(1):86-94. PMID: 19197950 A series of 72 pediatric medulloblastoma tumors has been studied at the genomic level (array-CGH), screened for CTNNB1 mutations and beta-catenin expression (immunohistochemistry). A subset of 40 tumor samples has been analyzed at the RNA expression level (Affymetrix HG U133 Plus 2.0). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12992
GSE37382 (Northcott ... Taylor) 2012 (Expression profiling by array, Affymetrix Human Gene 1.1 ST Array profiling of 285 primary medulloblastoma samples.) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37382
GSE10327 (M. Kool ) 62 samples, 2008 ( Expression profiling by array ) (beware it is sometimes referred as GSE10237 in original paper and several references - that is an error reference). Kool M, Koster J, Bunt J, Hasselt NE et al. Integrated genomics identifies five medulloblastoma subtypes with distinct genetic profiles, pathway signatures and clinicopathological features. PLoS One 2008 Aug 28;3(8):e3088. PMID: 18769486 Rack PG, Ni J, Payumo AY, Nguyen V et al. Arhgap36-dependent activation of Gli transcription factors. Proc Natl Acad Sci U S A 2014 Jul 29;111(30):11061-6. PMID: 25024229 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10327
Other datasets (not yet loaded):
(47.1 Gb, 2012) (Expression profiling by array, Genome variation profiling by SNP array, SNP genotyping by SNP array ) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 Here we report somatic copy number aberrations (SCNAs) in 1087 unique medulloblastomas. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37385
Facebook
TwitterAlthough localized to the mineralized matrix of bone, osteocytes are able to respond to systemic factors such as the calciotropic hormones 1,25(OH)2D3 and PTH. In the present studies, we examine the transcriptomic response to PTH in an osteocyte cell model and found that this hormone regulated an extensive panel of genes. Surprisingly, PTH uniquely modulated two cohorts of genes, one that was expressed and associated with the osteoblast to osteocyte transition and the other a cohort that was expressed only in the mature osteocyte. Interestingly, PTHM-bM-^@M-^Ys effects were largely to oppose the expression of differentiation-related genes in the former cohort, while potentiating the expression of osteocyte-specific genes in the latter cohort. A comparison of the transcriptional effects of PTH with those obtained previously with 1,25(OH)2D3 revealed a subset of genes that was strongly overlapping. While 1,25(OH)2D3 potentiated the expression of osteocyte-specific genes similar to that seen with PTH, the overlap between the two hormones was more limited. Additional experiments identified the PKA-activated phospho-CREB (pCREB) cistrome, revealing that while many of the differentiation-related PTH regulated genes were apparent targets of a PKA-mediated signaling pathway, a reduction in pCREB binding at sites associated with osteocyte-specific PTH targets appeared to involve alternative PTH activation pathways. That pCREB binding activities positioned near important hormone-regulated gene cohorts were localized to control regions of genes was reinforced by the presence of epigenetic enhancer signatures exemplified by unique modifications at histones H3 and H4. These studies suggest that both PTH and 1,25(OH)2D3 may play important and perhaps cooperative roles in limiting osteocyte differentiation from its precursors while simultaneously exerting distinct roles in regulating mature osteocyte function. Our results provide new insight into transcription factor-associated mechanisms through which PTH and 1,25(OH)2D3 regulate a plethora of genes important to the osteoblast/osteocyte lineage. Fully differentiated IDG-SW3 cells were treated in biological triplicate with 100nM PTH for 24 hours prior to mRNA isolation and sequencing. Vehicle treated samples were previously published in GSE54783: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1323967 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1323968 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1323969
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Integrated and annotated Seurat object created with the following script: https://doi.org/10.5281/zenodo.8413883
using the following studies: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162577
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142016
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135779
Facebook
TwitterscRNA data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 (Sur et al. 2023), see a detailed description of the study here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055256/ Data were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 to create a R Seurat object and converted into AnnData (h5ad) file to be able to analyse with e.g. python scanpy package. If you use this data, please cite Sur et al. 2023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease in the Western world, and encompasses a spectrum from simple steatosis to steatohepatitis (NASH). There is currently no approved pharmacologic therapy against NASH, partly due to an incomplete understanding of its molecular basis. The goal of this study was to determine the key differentially expressed genes (DEGs), as well as those genes and pathways central to its pathogenesis. We performed an integrative computational analysis of publicly available gene expression data in NASH from GEO (GSE17470, GSE24807, GSE37031, GSE89632). The DEGs were identified using GEOquery, and only the genes present in at least three of the studies, to a total of 190 DEGs, were considered for further analyses. The pathways, networks, molecular interactions, functional analyses were generated through the use of Ingenuity Pathway Analysis (IPA). For selected networks, we computed the centrality using igraph package in R. Among the statistically significant predicted networks (p-val < 0.05), three were of most biological interest: the first is involved in antimicrobial response, inflammatory response and immunological disease, the second in cancer, organismal injury and development and the third in metabolic diseases. We discovered that HNF4A is the central gene in the network of NASH connected to metabolic diseases and that it regulates HNF1A, an additional transcription regulator also involved in lipid metabolism. Therefore, we show, for the first time to our knowledge, that HNF4A is central to the pathogenesis of NASH. This adds to previous literature demonstrating that HNF4A regulates the transcription of genes involved in the progression of NAFLD, and that HNF4A genetic variants play a potential role in NASH progression.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for our paper titled "Leveraging Big Data of Immune Checkpoint Blockade Response Identifies Novel Potential Targets".
Bareche et al., Annals of Oncology (2022); https://doi.org/10.1016/j.annonc.2022.08.084
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
Background: The development of immune checkpoint blockade (ICB) has changed the way we treat various cancers. While ICB produces durable survival benefits in a number of malignancies, a large proportion of treated patients do not derive clinical benefit. Recent clinical profiling studies have shed light on molecular features and mechanisms that modulate response to ICB. Nevertheless, none of these identified molecular features were investigated in large enough cohorts to be of clinical value.
Materials and methods: Literature review was performed to identify relevant studies including clinical dataset of patient treated with ICB (anti-PD1/L1, anti-CTLA4 or the combo) and available sequencing data. Tumor mutational burden (TMB) and 37 previously reported gene expression (GE) signature were computed with respect to the original publication. Biomarker association with ICB response (IR) and survival (PFS/OS) was investigated separately within each study and combined together for meta-analysis.
Results: We performed a comparative meta-analysis of genomic and transcriptomic biomarkers of immune-checkpoint blockade (ICB) responses in over 3,600 patients across 12 tumor types and implemented an open-source web-application (predictIO.ca) for exploration. Tumor mutation burden (TMB) and 21/37 gene signatures were predictive of ICB responses across tumor types. We next developed a de novo gene expression signature (PredictIO) from our pan-cancer analysis and demonstrated its superior predictive value over other biomarkers. To identify novel targets, we computed the T-cell dysfunction score for each gene within PredictIO and their ability to predict dual PD-1/CTLA-4 blockade in mice. Two genes, F2RL1 (encoding protease-activated receptor-2) and RBFOX2 (encoding RNA-binding motif protein 9), were concurrently associated with worse ICB clinical outcomes, T cell dysfunction in ICB-naive patients and resistance to dual PD-1/CTLA-4 blockade in preclinical models.
Conclusions: Our study highlights the potential of large-scale meta-analyses in identifying novel biomarkers and potential therapeutic targets for cancer immunotherapy.
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
Data description
mouseModel:
Discovery_cohort:
Expression and SNV data of the discovery cohort
Validation_cohort:
Expression and SNV data of the validation cohort
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
12 fastq files with 1000 reads each, 4 index files for chr 1 for mm10, targets files with sample information.
References
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset collected from NCBI - GEO datasets: - GSE144113 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144113) - GSE76200 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76200) - GSE12791 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12791)
These datasets include four paclitaxel-resistant cell lines which includes BAS, HS578T, MCF7 and MDA-MB-231.
Gene expression analysis was performed using R in each of the datasets, which was between control cells and drug-resistant cells. And using different Bioinformatics databases, they were converted into gene symbols. Genes with a p-value of less than 0.05 were also removed.
Facebook
TwitterBackgroundNonalcoholic steatohepatitis (NASH) is rapidly becoming a major chronic liver disease worldwide. However, little is known concerning the pathogenesis and progression mechanism of NASH. Our aim here is to identify key genes and elucidate their biological function in the progression from hepatic steatosis to NASH.MethodsGene expression datasets containing NASH patients, hepatic steatosis patients, and healthy subjects were downloaded from the Gene Expression Omnibus database, using the R packages biobase and GEOquery. Differentially expressed genes (DEGs) were identified using the R limma package. Functional annotation and enrichment analysis of DEGs were undertaken using the R package ClusterProfile. Protein-protein interaction (PPI) networks were constructed using the STRING database.ResultsThree microarray datasets GSE48452, GSE63067 and GSE89632 were selected. They included 45 NASH patients, 31 hepatic steatosis patients, and 43 healthy subjects. Two up-regulated and 24 down-regulated DEGs were found in both NASH patients vs. healthy controls and in steatosis subjects vs. healthy controls. The most significantly differentially expressed genes were FOSB (P = 3.43×10-15), followed by CYP7A1 (P = 2.87×10-11), and FOS (P = 6.26×10-11). Proximal promoter DNA-binding transcription activator activity, RNA polymerase II-specific (P = 1.30×10-5) was the most significantly enriched functional term in the gene ontology analysis. KEGG pathway enrichment analysis indicated that the MAPK signaling pathway (P = 3.11×10-4) was significantly enriched.ConclusionThis study characterized hub genes of the liver transcriptome, which may contribute functionally to NASH progression from hepatic steatosis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Yeast single-cell gene expression data, database-derived prior knowledge network, and hand-curated gold standard network. Used to benchmark the Inferelator 3.0, SCENIC, and CellOracle.
Expression data (GSE144820_GSE125162.tsv.gz) is an integer count matrix [44343 rows x 6763 columns] with an index column (0) assembled from GSE144820 and GSE125162. Included is a paired metadata file (GSE144820_GSE125162_META_DATA.tsv.gz).
A database-derived prior knowledge network (YEASTRACT_20190713_BOTH.tsv) is a boolean connectivity matrix [6885 rows x 220 columns] with an index column (0) obtained from the YEASTRACT database on 07132019. It consists of edges which have both DNA localization evidence and evidence of changes to gene expression after TF perturbation.
A curated gold standard network (Tchourine_2018_yeast_gold_standard.tsv) is a signed connectivity matrix [993 rows x 98 columns] with an index column (0). Details of its construction have been published.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for Geo-Benchmark
Dataset Summary
Geo-Benchmark aims to assess Large Language Models' (LLM) geographical abilities across a multitude of tasks. It is built from 12 datasets split across 8 differents tasks:
Knowledge/Coordinates Prediction : GeoQuestions1089 Knowledge/Yes|No questions: GeoQuestions1089 Knowledge/Regression questions: GeoQuestions1089, GeoQuery Knowledge/Place Prediction: GeoQuestions1089, GeoQuery, Ms Marco Reasoning/Scenario Complex QA:… See the full description on the dataset page: https://huggingface.co/datasets/rfr2003/Geo_Benchmark.
Facebook
TwitterThis experiment is contains mouse organism part samples and strand-specific RNA-seq data from experiment E-GEOD-41637 (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-41637/), which aimed at assessing tissue-specific transcriptome variation across mammals, with chicken used as an outgroup in evolutionary analyses. Each organism part (with the exception of heart) was sourced from animals from three different strains: C57BL/6, DBA/2J and CD1. (There is no data for heart from the C57BL/6 strain.) This data set was originally submitted to NCBI Gene Expression Omnibus under accession number GSE41637 (http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE41637) and later imported to ArrayExpress as E-GEOD-41637.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to generate figures in the manuscript, excluding RNA-seq and ChIP-seq data, which can instead be downloaded from Gene Expression Omnibus: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE122456.
Facebook
TwitterThe data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus (GEO) and are accessible through GEO Series accession number GSE204989 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE204989).
Sequence Read Archive (SRA) data, BioSamples, and GEO holdings can be accessed from the NCBI BioProject PRJNA843039 (http://www.ncbi.nlm.nih.gov/bioproject/PRJNA843039).
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]
"Idiopathic pulmonary fibrosis (IPF) is a specific form of chronic, progressive fibrosing interstitial disease of unknown cause. It remains impractical to conduct early diagnosis and predict IPF progression just based on gene expression information. Moreover, the relationship between gene expression and quantitative phenotypic value in IPF keeps controversial. To identify biomarkers to predict survival in IPF, we profiled protein-coding gene expression in peripheral blood mononuclear cells (PBMCs). We linked the gene expression level with the quantitative phenotypic variation in IPF, including diffusing capacity of the lung for carbon monoxide (DLCO) and forced vital capacity (FVC) percent predicted. In silico analyses on the expression profiles and quantitative phenotypic data allowed for the generation of a set of IPF molecular signature that predicted survival of IPF effectively."
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38958
We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
Facebook
TwitterTable of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)
Facebook
TwitterNCBI Gene Expression Omnibus accession numbers GSE49047 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49047).
Facebook
TwitterGWHed/geoquery dataset hosted on Hugging Face and contributed by the HF Datasets community