https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background
RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.
Results
With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to aid in cluster number selection, Mfuzz loop computations, cluster overlap analysis, as well as cluster enrichments.
Conclusion
RNfuzzyApp is an intuitive, easy to use and interactive R shiny app for RNA-seq differential expression and time-series analysis, offering a rich selection of interactive plots, providing a quick overview of raw data and generating rapid analysis results. Furthermore, its orthology assignment, enrichment analysis, as well as ID conversion functions are accessible to non-model organisms.
Methods Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt: mean values calculated from raw reads of replicates, downloaded from gene expression omnibus (dataset GSE143430 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE143430).
Haering_etal_extendedDatatable_1a_Tabulamurissenis_3vs12m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1b_Tabulamurissenis_3vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1c_Tabulamurissenis_12vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1d_Tabulamurissenis_3vs12m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1e_Tabulamurissenis_3vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1f_Tabulamurissenis_12vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2a_Tabulamurissenis_cluster1_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2b_Tabulamurissenis_cluster2_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2c_Tabulamurissenis_cluster3_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2d_Tabulamurissenis_cluster4_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2e_Tabulamurissenis_cluster5_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3a_DmLeg_cluster1_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3b_DmLeg_cluster2_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3c_DmLeg_cluster3_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3d_DmLeg_cluster4_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3e_DmLeg_cluster5_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3f_DmLeg_cluster6_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3g_DmLeg_cluster7_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3h_DmLeg_cluster8_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3i_DmLeg_cluster9_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3j_DmLeg_cluster10_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3k_DmLeg_cluster11_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3l_DmLeg_cluster12_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Uploaded files are raw (*_raw.h5mu), filtered (*_filtered.h5mu), and trained (*_trained.h5mu) h5mu objects, as processed by mTopic.
Single-cell data processed here are publicly available from the Gene Expression Omnibus (GEO) and 10x Genomics:
P22 mouse brain datasets (P22_Mouse_Brain_H3K4me3_RNA, P22_Mouse_Brain_H3K27me3_RNA, P22_Mouse_Brain_H3K27ac_RNA) [1]
RNA data: GSE218593
– GSM6753043 for ATAC-RNA
– GSM6753046 for H3K4me3-RNA
– GSM6753044 for H3K27me3-RNA
– GSM6753045 for H3K27ac-RNA
ATAC/histone modification data: GSE205055
– GSM6758285 for ATAC-RNA
– GSM6704980 for H3K4me3-RNA
– GSM6704978 for H3K27me3-RNA
– GSM6704979 for H3K27ac-RNA
Human PBMC dataset (Human_PBMC_ATAC_RNA_Protein) [2]
GSE166188
– GSM5065524 for ATAC
– GSM5065525 for RNA
– GSM5065526 for protein
Human tonsil dataset (Human_Tonsil_RNA_Protein)
Available from 10x Genomics here.
References
[1] Zhang D, Deng Y, Kukanja P, Agirre E et al. Spatial epigenome-transcriptome co-profiling of mammalian tissues. Nature 2023 Apr;616(7955):113-122. PMID: 36922587
[2] Mimitou EP, Lareau CA, Chen KY, Zorzetto-Fernandes AL et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat Biotechnol 2021 Oct;39(10):1246-1258. PMID: 34083792
Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Harmony integrated single-cell Dataset containing all cells from 4 different murine fibrotic disease models. The dataset is largely public available but also contains unpublished (cardiac) samples https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE137720https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104154https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138826https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-9816https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-7895
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Uploaded files are raw (*_raw.rds) and filtered (*_filtered.rds) RDS objects used for R tutorials of mTopic.
Single-cell data processed here are publicly available from the Gene Expression Omnibus (GEO) and 10x Genomics:
P22 mouse brain dataset (P22_Mouse_Brain_ATAC_RNA) [1]
– GSE218593 (GSM6753043) for RNA
– GSE205055 (GSM6758285) for ATAC
Human tonsil dataset (Human_Tonsil_RNA_Protein)
Available from 10x Genomics here.
Human PBMC dataset (Human_PBMC_ATAC_RNA_Protein) [2]
GSE166188
– GSM5065524 for ATAC
– GSM5065525 for RNA
– GSM5065526 for protein
References
[1] Zhang D, Deng Y, Kukanja P, Agirre E et al. Spatial epigenome-transcriptome co-profiling of mammalian tissues. Nature 2023 Apr;616(7955):113-122. PMID: 36922587
[2] Mimitou EP, Lareau CA, Chen KY, Zorzetto-Fernandes AL et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat Biotechnol 2021 Oct;39(10):1246-1258. PMID: 34083792
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full code and data to analyze bone marrow single-cell data from 4 healthy donor, dataset GSE155698 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4710729)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains pre-processed A. thaliana root , the H. sapiens aortic valve datasets, PBMC Covid atlas and public 10x datasetse used in the paper, SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data. The raw datasets provided in the links below are pre-processed for quality control with respect to both cells and genes.
A. thaliana datasets are sourced from the following locations at Single-cell Gene expression Atlas and Gene Expression Omnibus (GEO):
H. sapiens datasets are obtained from the NCBI database : https://www.ncbi.nlm.nih.gov/bioproject/PRJNA562645/
All COVID atlas datasets are from: http://covid19.cancer-pku.cn . covid_atlas_data1.zip contains the h5ad files and covid_atlas_data2.zip contains the Seurat rds files.
PBMC datasets are from the following public sources:
References for the Datasets :
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scRNA data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 (Sur et al. 2023), see a detailed description of the study here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055256/
Data were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 to create a R Seurat object and converted into AnnData (h5ad) file to be able to analyse with e.g. python scanpy package.
If you use this data, please cite Sur et al. 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]"RNA was isolated from bronchial brushings obtained from current and former smokers with and without COPD. mRNA expression was profiled using Affymetrix Human Gene 1.0 ST Arrays."http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37147We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
This experiment is contains chicken organism part samples and strand-specific RNA-seq data from experiment E-GEOD-41637 (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-41637/), which aimed at assessing tissue-specific transcriptome variation across mammals, with chicken used as an outgroup in evolutionary analyses. Each organism part was sourced from three different animals as biological replicates. This data set was originally submitted to NCBI Gene Expression Omnibus under accession number GSE41637 (http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE41637) and later imported to ArrayExpress as E-GEOD-41637.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for our paper titled "Leveraging Big Data of Immune Checkpoint Blockade Response Identifies Novel Potential Targets".
Bareche et al., Annals of Oncology (2022); https://doi.org/10.1016/j.annonc.2022.08.084
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
Background: The development of immune checkpoint blockade (ICB) has changed the way we treat various cancers. While ICB produces durable survival benefits in a number of malignancies, a large proportion of treated patients do not derive clinical benefit. Recent clinical profiling studies have shed light on molecular features and mechanisms that modulate response to ICB. Nevertheless, none of these identified molecular features were investigated in large enough cohorts to be of clinical value.
Materials and methods: Literature review was performed to identify relevant studies including clinical dataset of patient treated with ICB (anti-PD1/L1, anti-CTLA4 or the combo) and available sequencing data. Tumor mutational burden (TMB) and 37 previously reported gene expression (GE) signature were computed with respect to the original publication. Biomarker association with ICB response (IR) and survival (PFS/OS) was investigated separately within each study and combined together for meta-analysis.
Results: We performed a comparative meta-analysis of genomic and transcriptomic biomarkers of immune-checkpoint blockade (ICB) responses in over 3,600 patients across 12 tumor types and implemented an open-source web-application (predictIO.ca) for exploration. Tumor mutation burden (TMB) and 21/37 gene signatures were predictive of ICB responses across tumor types. We next developed a de novo gene expression signature (PredictIO) from our pan-cancer analysis and demonstrated its superior predictive value over other biomarkers. To identify novel targets, we computed the T-cell dysfunction score for each gene within PredictIO and their ability to predict dual PD-1/CTLA-4 blockade in mice. Two genes, F2RL1 (encoding protease-activated receptor-2) and RBFOX2 (encoding RNA-binding motif protein 9), were concurrently associated with worse ICB clinical outcomes, T cell dysfunction in ICB-naive patients and resistance to dual PD-1/CTLA-4 blockade in preclinical models.
Conclusions: Our study highlights the potential of large-scale meta-analyses in identifying novel biomarkers and potential therapeutic targets for cancer immunotherapy.
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
Data description
mouseModel:
Discovery_cohort:
Expression and SNV data of the discovery cohort
Validation_cohort:
Expression and SNV data of the validation cohort
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the Spider-Realistic dataset used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.
It contains the following files:
- spider-realistic.json
# The spider-realistic evaluation set
# Examples: 508
# Databases: 19
- dev.json
# The original dev split of Spider
# Examples: 1034
# Databases: 20
- tables.json
# The original DB schemas from Spider
# Databases: 166
- README.txt
- license
The Spider-Realistic dataset is created based on the dev split of the Spider dataset realsed by Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task." It is a subset of the original dataset with explicit mention of the column names removed. The sql queries and databases are kept unchanged.
For the format of each json file, please refer to the github page of Spider https://github.com/taoyds/spider.
For the database files please refer to the official Spider release https://yale-lily.github.io/spider.
This dataset is distributed under the CC BY-SA 4.0 license.
If you use the dataset, please cite the following papers including the original Spider datasets, Finegan-Dollak et al., 2018 and the original datasets for Restaurants, GeoQuery, Scholar, Academic, IMDB, and Yelp.
@article{deng2020structure,
title={Structure-Grounded Pretraining for Text-to-SQL},
author={Deng, Xiang and Awadallah, Ahmed Hassan and Meek, Christopher and Polozov, Oleksandr and Sun, Huan and Richardson, Matthew},
journal={arXiv preprint arXiv:2010.12773},
year={2020}
}
@inproceedings{Yu&al.18c,
year = 2018,
title = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
booktitle = {EMNLP},
author = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev }
}
@InProceedings{P18-1033,
author = "Finegan-Dollak, Catherine
and Kummerfeld, Jonathan K.
and Zhang, Li
and Ramanathan, Karthik
and Sadasivam, Sesh
and Zhang, Rui
and Radev, Dragomir",
title = "Improving Text-to-SQL Evaluation Methodology",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "351--360",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1033"
}
@InProceedings{data-sql-imdb-yelp,
dataset = {IMDB and Yelp},
author = {Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig},
title = {SQLizer: Query Synthesis from Natural Language},
booktitle = {International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM},
month = {October},
year = {2017},
pages = {63:1--63:26},
url = {http://doi.org/10.1145/3133887},
}
@article{data-academic,
dataset = {Academic},
author = {Fei Li and H. V. Jagadish},
title = {Constructing an Interactive Natural Language Interface for Relational Databases},
journal = {Proceedings of the VLDB Endowment},
volume = {8},
number = {1},
month = {September},
year = {2014},
pages = {73--84},
url = {http://dx.doi.org/10.14778/2735461.2735468},
}
@InProceedings{data-atis-geography-scholar,
dataset = {Scholar, and Updated ATIS and Geography},
author = {Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer},
title = {Learning a Neural Semantic Parser from User Feedback},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2017},
pages = {963--973},
location = {Vancouver, Canada},
url = {http://www.aclweb.org/anthology/P17-1089},
}
@inproceedings{data-geography-original
dataset = {Geography, original},
author = {John M. Zelle and Raymond J. Mooney},
title = {Learning to Parse Database Queries Using Inductive Logic Programming},
booktitle = {Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2},
year = {1996},
pages = {1050--1055},
location = {Portland, Oregon},
url = {http://dl.acm.org/citation.cfm?id=1864519.1864543},
}
@inproceedings{data-restaurants-logic,
author = {Lappoon R. Tang and Raymond J. Mooney},
title = {Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing},
booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
year = {2000},
pages = {133--141},
location = {Hong Kong, China},
url = {http://www.aclweb.org/anthology/W00-1317},
}
@inproceedings{data-restaurants-original,
author = {Ana-Maria Popescu, Oren Etzioni, and Henry Kautz},
title = {Towards a Theory of Natural Language Interfaces to Databases},
booktitle = {Proceedings of the 8th International Conference on Intelligent User Interfaces},
year = {2003},
location = {Miami, Florida, USA},
pages = {149--157},
url = {http://doi.acm.org/10.1145/604045.604070},
}
@inproceedings{data-restaurants,
author = {Alessandra Giordani and Alessandro Moschitti},
title = {Automatic Generation and Reranking of SQL-derived Answers to NL Questions},
booktitle = {Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge},
year = {2012},
location = {Montpellier, France},
pages = {59--76},
url = {https://doi.org/10.1007/978-3-642-45260-4_5},
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]"A causal role of mutations in genes encoding for multiple general transcription factors in neurodevelopmental disorders including autism suggested that alterations at the global level of gene expression regulation might also relate to disease risk in sporadic cases of autism. This premise can be tested by evaluating for global changes in the overall distribution of gene expression levels. For instance, in mice, we recently showed that variability in hippocampal-dependent behaviors was associated with variability in the pattern of the overall distribution of gene expression levels, as assessed by variance in the distribution of gene expression levels in the hippocampus. We hypothesized that a similar change in the variance in gene expression levels might be found in children with autism. Gene expression microarrays covering greater than 47,000 unique RNA transcripts were done on purified RNA from peripheral blood lymphocytes of children with autism (n=82) and controls (n=64). The variance in the distribution of gene expression levels from each microarray was compared between groups of children. Also tested was whether a risk factor for autism, increased paternal age, was associated with variance in the overall distribution of gene expression levels. A decrease in the variance in the distribution of gene expression levels in peripheral blood lymphocytes (PBL) was associated with the diagnosis of autism and a risk factor for autism, increased paternal age. Traditional approaches to microarray analysis of gene expression suggested a possible mechanism for decreased variance in gene expression. Gene expression pathways involved in transcriptional regulation were down-regulated in the blood of children with autism and children of older fathers. Thus, results from global and gene specific approaches to studying microarray data were complimentary and supported the hypothesis that alterations at the global level of gene expression regulation are related to autism and increased paternal age. Regulation of transcription, thus, represents a possible point of convergence for multiple etiologies of autism and other neurodevelopmental disorders."http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25507We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
scRNA data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE199308 (Huang et al. 2023), see a detailed description of the study here: https://atlas.gs.washington.edu/mmca_v2/public/about.html Data were downloaded from https://atlas.gs.washington.edu/mmca_v2/public/download.html to create an AnnData (h5ad) file with meta data to be able to analyse with e.g. python scanpy package. If you use this data, please cite Huang et al. 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The expression matrix and gene list for github demo code at https://github.com/thamnguy/l-PGC. The dataset contains peripheral blood single cell from healthy donors in dataset GSM4710729 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4710729)
GEO accession information for omics RNA-seq data. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116393. Format: GEO accession information for omics RNA-seq data.
This dataset is associated with the following publication: Huang, W., D. Bencic, R. Flick, D. Nacci, B. Clark, L. Burkhard, T. Lahren, and A. Biales. Characterization of the Fundulus heteroclitus embryo transcriptional response and development of a gene expression-based fingerprint of exposure for the alternative flame retardant, TBPH (bis (2-ethylhexyl)-tetrabromophthalate). ENVIRONMENTAL POLLUTION. Elsevier Science Ltd, New York, NY, USA, 247: 696-705, (2019).
https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/
Purpose: The powdery mildew fungus, Blumeria graminis, is an obligate biotrophic pathogen of cereals and has significant impact on food security (Dean et al., 2012. Molecular Plant Pathology 13 (4): 414-430. DOI: 10.1111/j.1364-3703.2011.00783.x). Blumeria graminis f. sp. hordei (Bgh) is the causal agent of powdery mildew on barley (Hordeum vulgare L.). We sought to identify small RNAs (sRNAs) from both barley and Bgh that regulate gene expression both within species and cross-kingdom. Overall design: 90 samples analyzed = 5 genotypes * 6 time points * 3 replications Note: This experiment used the identical split-plot design, tissue, and source RNA as GEO submission # 101304 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE101304).
Sequences from this study are available at the NCBI GEO under accession series GSE131846 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE131846
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28146
"Microarray analyses of laser-captured hippocampus reveal distinct gray and white matter signatures associated with incipient Alzheimer’s disease" Blalock et al.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4117206%2F295240dbe240e64fb667b364fbee3bc7%2FGSE28146valuedistribution.png?generation=1576089220300565&alt=media" alt="">
Disease state: control, incipient, moderate, and severe patients around 80-90 years old male and female
8 controls 7 incipient 8 moderate 7 severe
"Alzheimer's disease (AD) is a devastating neurodegenerative disorder that threatens to reach epidemic proportions as our population ages. Although much research has examined molecular pathways associated with AD, relatively few studies have focused on critical early stages. Our prior microarray study correlated gene expression in human hippocampus with AD markers. Results suggested a new model of early-stage AD in which pathology spreads along myelinated axons, orchestrated by upregulated transcription and epigenetic factors related to growth and tumor suppression (Blalock et al., 2004). However, the microarray analyses were performed on RNA from fresh frozen hippocampal tissue blocks containing both gray and white matter, potentially obscuring region-specific changes. In the present study, we used laser capture microdissection to exclude major white matter tracts and selectively collect CA1 hippocampal gray matter from formalin-fixed, paraffin-embedded (FFPE) hippoc ampal sections of the same subjects assessed in our prior study. Microarray analyses of this gray matter-enriched tissue revealed many correlations similar to those seen in our prior study, particularly for neuron-related genes. Nonetheless, in the laser-captured tissue, we found a striking paucity of the AD-associated epigenetic and transcription factor genes that had been strongly overrepresented in the prior tissue block study. In addition, we identified novel pathway alterations that may have considerable mechanistic implications, including downregulation of genes stabilizing ryanodine receptor Ca2+ release and upregulation of vascular development genes. We conclude that FFPE tissue can be a reliable resource for microarray studies, that upregulation of growth-related epigenetic/ transcription factors with incipient AD is predominantly localized to white matter, further supporting our prior findings and model, and that alterations in vascular and ryanodine receptor-relat ed pathways in gray matter are closely associated with incipient AD."
Enjoy!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript link: pending
This repository contains, or provides links to, the code and associated data used to generate figures 3 and 4 of the manuscript. Due to limited storage space here, the experimental documentation, raw FASTQs, Space Ranger Inputs, and Space Ranger (v3.1.3) Outputs for the H&E dataset are available through Gene Expression Omnibus series accession number GSE296623 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE296623. The Loupe files, .csv annotation exports from Loupe, and code are provided here. Please note the cell type annotations are incomplete; PrintPattern "Random_TME_ALL" was annotated for the manuscript.
Space Ranger was run again using an IF image, and the data is deposited here. *NOTE: The IF image is a live cell image (cells are transduced) taken right before fixation. The cells have migrated slightly between the image and when they were fixed (long scan/image acquisition time) so please proceed with extreme caution if using that version of the data since the image is slightly different than what was actually transferred to the capture area. This data was used only for illustrative purposes. I did not analyze any of the IF outs.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background
RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.
Results
With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to aid in cluster number selection, Mfuzz loop computations, cluster overlap analysis, as well as cluster enrichments.
Conclusion
RNfuzzyApp is an intuitive, easy to use and interactive R shiny app for RNA-seq differential expression and time-series analysis, offering a rich selection of interactive plots, providing a quick overview of raw data and generating rapid analysis results. Furthermore, its orthology assignment, enrichment analysis, as well as ID conversion functions are accessible to non-model organisms.
Methods Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt: mean values calculated from raw reads of replicates, downloaded from gene expression omnibus (dataset GSE143430 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE143430).
Haering_etal_extendedDatatable_1a_Tabulamurissenis_3vs12m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1b_Tabulamurissenis_3vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1c_Tabulamurissenis_12vs27m_DEA.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1d_Tabulamurissenis_3vs12m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1e_Tabulamurissenis_3vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_1f_Tabulamurissenis_12vs27m_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2a_Tabulamurissenis_cluster1_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2b_Tabulamurissenis_cluster2_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2c_Tabulamurissenis_cluster3_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2d_Tabulamurissenis_cluster4_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_2e_Tabulamurissenis_cluster5_gpofiler.txt: Tabula muris senis limb muscle data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) from 3, 12 and 27month males, processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3a_DmLeg_cluster1_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3b_DmLeg_cluster2_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3c_DmLeg_cluster3_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3d_DmLeg_cluster4_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3e_DmLeg_cluster5_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3f_DmLeg_cluster6_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3g_DmLeg_cluster7_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3h_DmLeg_cluster8_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3i_DmLeg_cluster9_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3j_DmLeg_cluster10_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3k_DmLeg_cluster11_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)
Haering_etal_extendedDatatable_3l_DmLeg_cluster12_gpofiler.txt: Haering_etal_extendedData_DmdevLeg_GSE143430_mean.txt processed with RNfuzzyApp (https://gitlab.com/habermann_lab/rna-seq-analysis-app)