The DBD (transcription factor database) provides genome-wide transcription factor predictions for organisms across the tree of life. The prediction method identifies sequence-specific DNA-binding transcription factors through homology using profile hidden Markov models (HMMs) of domains from Pfam and SUPERFAMILY. It does not include basal transcription factors or chromatin-associated proteins.
Database about gene regulation and gene expression in prokaryotes. It includes a manually curated and unique collection of transcription factor binding sites. A variety of bioinformatics tools for the prediction, analysis and visualization of regulons and gene reglulatory networks is included. The integrated approach provides information about molecular networks in prokaryotes with focus on pathogenic organisms. In detail this concerns: * transcriptional regulation (transcription factors and their DNA binding sites * signal transduction (two-component systems, phosphylation cascades) * protein interactions (complex formation, oligomerization) * biochemical pathways (chemical reactions) * other regulation events (e.g. codon usage, etc. ...) It aims to be a resource to model protein-host interactions and to be a suitable platform to analyze high-throughput data from proteomis and transcriptomics experiments (systems biology). Currently it mainly contains detailed information about operon and promoter structures including huge collections of transcription factor binding sites. If an appropriate number of regulatory binding sites is available, a position weight matrix (PWM) and a sequence logo is provided, which can be used to predict new binding sites. This data is collected manually by screening the original scientific literature. PRODORIC also handles protein-protein interactions and signal-transduction cascades that commonly occur in form of two-component systems in prokaryotes. Furthermore it contains metabolic network data imported from the KEGG database.
Clostridium thermocellum is a thermophilic bacterium recognized for its natural ability to effectively deconstruct cellulosic biomass. While there is a large body of studies on the genetic engineering of this bacterium and its physiology to-date, there is limited knowledge in the transcriptional regulation in this organism and thermophilic bacteria in general. The study herein is the first report of a high-throughput application of DNA-affinity purification sequencing (DAP-seq) to transcription factors (TFs) from a thermophile. We applied DAP-seq to >90 TFs in C. thermocellum and detected genome-wide binding sites for 11 of them. We then compiled and aligned DNA binding sequences from these TFs to deduce the primary DNA-binding sequence motifs for each TF. These binding motifs are further validated with electrophoretic mobility shift assay (EMSA) and are used to identify individual TFs’ regulatory targets in C. thermocellum. Our results led to the discovery of novel, uncharacterized TFs as well as homologues of previously studied TFs including RexA-, LexA- and LacI-type TFs. We then used these data to reconstruct gene regulatory networks for the 11 TFs individually, which resulted in a global network encompassing the TFs with some interconnections. As gene regulation governs and constrains how bacteria behave, our findings shed light on the roles of TFs delineated by their regulons, and potentially provides a means to enable rational, advanced genetic engineering of C. thermocellum and other organisms alike towards a desired phenotype.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the past decades, studies have reported that the combinatorial regulation of transcription factors (TFs) and microRNAs (miRNAs) is essential for the appropriate execution of biological events and developmental processes. Dysregulations of these regulators often cause diseases. However, there are no available resources on the regulatory cascades of TFs and miRNAs in the context of human diseases. To fulfill this vacancy, we established the TMREC database in this study. First, we integrated curated transcriptional and post-transcriptional regulations to construct the TF and miRNA regulatory network. Next, we identified all linear paths using the Breadth First Search traversal method. Finally, we used known disease-related genes and miRNAs to measure the strength of association between cascades and diseases. Currently, TMREC consists of 74,248 cascades and 25,194 cascade clusters, involving in 412 TFs, 266 miRNAs and 545 diseases. With the expanding of experimental support regulation data, we will regularly update the database. TMREC aims to help experimental biologists to comprehensively analyse gene expression regulation, to understand the aetiology and to predict novel therapeutic targets.TMREC is freely available at http://bioinfo.hrbmu.edu.cn/TMREC/.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
ReMap is a large scale integrative analysis of DNA-binding experiments for Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana transcriptional regulators. The catalogues are the results of the manual curation of ChIP-seq, ChIP-exo, DAP-seq from public sources (GEO, ENCODE, ENA).
ReMap (https://remap.univ-amu.fr) aims to provide manually curated, high-quality catalogs of regulatory regions resulting from a large-scale integrative anlysis of DNA-binding experiments in Human, Mouse, Fly and Arabidopsis thaliana for hundreds of transcription factors and regulators. In this 2022 update, we have uniformly processed >11 000 DNA-binding sequencing datasets from public sources across four species. The updated Human regulatory atlas includes 8103 datasets covering a total of 1210 transcriptional regulators (TRs) with a catalog of 182 million (M) peaks, while the updated Arabidopsis atlas reaches 4.8M peaks, 423 TRs across 694 datasets. Also, this ReMap release is enriched by two new regulatory catalogs for Mus musculus and Drosophila melanogaster. First, the Mouse regulatory catalog consists of 123M peaks across 648 TRs as a result of the integration and validation of 5503 ChIP-seq datasets. Second, the Drosophila melanogaster catalog contains 16.6M peaks across 550 TRs from the integration of 1205 datasets. The four regulatory catalogs are browsable through track hubs at UCSC, Ensembl and NCBI genome browsers. Finally, ReMap 2022 comes with a new Cis Regulatory Module identification method, improved quality controls, faster search results, and better user experience with an interactive tour and video tutorials on browsing and filtering ReMap catalogs.
We thank our users for past and future feedback to make ReMap useful for the community. The ReMap team welcomes your feedback on the catalogs, use of the website and use of the downloadable files. Please contact benoit.ballester@inserm.fr for development requests.
Reference:
ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments Fayrouz Hammal, Pierre de Langen, Aurélie Bergon, Fabrice Lopez, Benoit BallesterNucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D316–D325, https://doi.org/10.1093/nar/gkab996
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to benchmark human TF-target datasets via TF activities in 3 benchmark datasets. Described in Garcia-Alonso et al 2019
Check https://github.com/saezlab/TFbenchmark to access the corresponding code.
Study abstract
Prediction of transcription factor (TF) activities from the gene expression of their targets (i.e. TF regulon) is becoming a widely-used approach to characterize the functional status of transcriptional regulatory circuits. Several strategies and datasets have been proposed to link the target genes likely regulated by a TF, each one providing a different level of evidence. The most established ones are: (i) manually curated repositories, (ii) interactions derived from ChIP-seq binding data, (iii) in silico prediction of TF binding on gene promoters, and (iv) reverse-engineered regulons from large gene expression datasets. However, it is not known how these different sources of regulons affect the TF activity estimations, and thereby downstream analysis and interpretation. Here we compared the accuracy and biases of these strategies to define human TF regulons by means of their ability to predict changes in TF activities in three reference benchmark datasets. We assembled a collection of TF-target interactions among 1,541 TFs and evaluated how the different molecular and regulatory properties of the TFs, such as the DNA-binding domain, specificities or mode of interaction with the chromatin, affect the predictions of TF activity changes. We assessed their coverage and found little overlap on the regulons derived from each strategy and better performance by literature-curated information followed by ChIP-seq data. We provide an integrated resource of all TF-target interactions derived through these strategies with a confidence score, as a resource for enhanced prediction of TF activities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete list of TF pairs with significant target function overlaps but lower than expected target gene overlaps. Negative association (i.e., lower than expected target gene overlaps) of two TFs is defined as a negative Phi coefficient of the target gene overlaps of two TFs – TF1 and TF2. (XLSX 46 kb)
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
We reconstructed gene regulatory networks for 38 tissues from the Genotype-Tissue Expression project (GTEx), and used these networks to investigate gene expression and regulation across these tissues. In the RData file, we share the following objects:
- edges: an 19,476,492 by 3 data.frame including three columns: TF (the transcription factor's gene symbol), Gene (Ensembl ID), Prior (whether an edge is canonical (1) or non-canonical (0)).
- exp: a 30,243 by 9,435 matrix including normalized expression data for each sample.
- expTS: a 30,243 by 38 matrix including, for each gene and each tissue, information on whether the gene is expressed in a tissue-specific manner in that tissue (1) or not (0).
- genes: a 30,243 by 4 data.frame that includes annotation information (Symbol) for Ensembl gene IDs (Name). This data.frame also includes information on whether genes are also transcription factors (AlsoTF), with options: no, yes/motif (TF with a known DNA-binding motif) yes/nomotif (TF without a known DNA-binding motif). In addition, the multiplicity of the gene (Multiplicity) is given.
- net: a 19,476,492 by 38 matrix that includes edge weights for each tissue. Edge order corresponds to edge order in the the object "edges".
- netTS: a 19,476,492 by 38 matrix that includes information of whether edges are specific to a tissue (1) or not (0).
- samples: a 9,435 by 2 data.frame that includes sample identifiers (matching the identifiers in "exp") and the tissue to which these samples belong.
Database of experimentally validated gene regulatory relations and the corresponding transcription factor binding sites upstream of Bacillus subtilis genes. The database allows the comparison of systematic experiments with individual experimental results in order to facilitate the elucidation of the complete B. subtilis gene regulatory network. The current version is constructed by surveying 947 references and contains the information of 120 binding factors and 1475 gene regulatory relations. For each promoter, all of its known cis-elements are listed according to their positions, while these cis-elements are aligned to illustrate the consensus sequence for each transcription factor. All probable transcription factors coded in the genome were classified using Pfam motifs. The DBTBS database was reorganized to show operons instead of individual genes as the building blocks of gene regulatory networks. It now contains 463 experimentally known operons, as well as their terminator sequences if identifiable. In addition, 517 transcriptional terminators were identified computationally. (De Hoon, M.J.L. et al., PLoS Comput. Biol. 1, e25 (2005)). A new section was added under "Motif conservation", which presents hexameric motifs found to be conserved to different extents between upstream intergenic regions of genus-specific subgroups of homologous proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regulation of gene expression is essential to determining the functional complexity and morphological diversity seen among different cells. Transcriptional regulation is a crucial step in gene expression regulation because the genetic information is directly read from DNA by sequence-specific transcription factors (TFs). Although several mouse TF databases created from genome sequences and transcriptomes are available, a cell type-specific TF database from any normal cell populations is still lacking. We identify cell type-specific TF genes expressed in cochlear inner hair cells (IHCs) and outer hair cells (OHCs) using hair cell-specific transcriptomes from adult mice. IHCs and OHCs are the two types of sensory receptor cells in the mammalian cochlea. We show that 1,563 and 1,616 TF genes are respectively expressed in IHCs and OHCs among 2,230 putative mouse TF genes. While 1,536 are commonly expressed in both populations, 73 genes are differentially expressed (with at least a twofold difference) in IHCs and 13 are differentially expressed in OHCs. Our datasets represent the first cell type-specific TF databases for two populations of sensory receptor cells and are key informational resources for understanding the molecular mechanism underlying the biological properties and phenotypical differences of these cells.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As chromatin accessibility provides rich information on transcription factor binding process, for a given TF-based raw regulon, firstly we test whether the TF motif is enriched in this regulon. To perform this efficiently, we have built our own database in BED format, which contains all available TF motifs and their occurrences across the potential binding regions (TSS$\pm10$ kb) of all MOUSE genes.
Public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. For each gene, TFBSs conserved in orthologous sequences from at least two different species must be available. Promoter sequences as well as the original GenBank or RefSeq entries are additionally supplied in case of future identification conflicts. The final TSS annotation has been refined using the database dbTSS. Up to this release, 500 bps upstream the annotated transcription start site (TSS) according to REFSEQ annotations have been always extracted to form the collection of promoter sequences from human, mouse, rat and chicken. For each regulatory site, the position, the motif and the sequence in which the site is present are available in a simple format. Cross-references to EntrezGene, PubMed and RefSeq are also provided for each annotation. Apart from the experimental promoter annotations, predictions by popular collections of weight matrices are also provided for each promoter sequence. In addition, global and local alignments and graphical dotplots are also available.
Collects mammalian cis- and trans-regulatory elements together with experimental evidence. Regulatory elements were mapped on to assembled genomes. Resource for gene regulation and function studies. Users can retrieve primers, search TF target genes, retrieve TF motifs, search Gene Regulatory Networks and orthologs, and make use of sequence analysis tools. Uses databases such as Genbank, EPD and DBTSS, and employ promoter finding program FirstEF combined with mRNA/EST information and cross-species comparisons. Manually curated.
Curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs). Includes experimentally verified fly regulatory elements along with their DNA sequence, associated genes, and expression patterns they direct. Submission of experimentally verified cis-regulatory elements that are not included in REDfly database are welcome.
Target genes of transcription factors from published ChIP-chip, ChIP-seq, and other transcription factor binding site profiling studies
Plant Transcription Factor Database (PlantTFDB) provides a comprehensive, high-quality resource of plant transcription factors (TFs), regulatory elements and interactions between them. In the latest version, It contains 320 370 TFs, classified into 58 families, from 165 species. Abundant functional and evolutionary annotations (e.g., GO, functional description, binding motifs, cis-element, regulation, references, orthologous groups and phylogenetic tree, etc.) are provided for identified TFs. In addition, multiple online tools are set up for TF identification, regulation prediction and functional enrichment analyses.
The robustness and sensitivity of gene networks to environmental changes is critical for cell survival. How gene networks produce specific, chronologically ordered responses to genome-wide perturbations, while robustly maintaining homeostasis, remains an open question. We analysed if short- and mid-term genome-wide responses to shifts in RNA polymerase (RNAP) concentration are influenced by the known topology and logic of the transcription factor network (TFN) of Escherichia coli. We found that, at the gene cohort level, the magnitude of the single-gene, mid-term transcriptional responses to changes in RNAP concentration can be explained by the absolute difference between the gene’s numbers of activating and repressing input transcription factors (TFs). Interestingly, this difference is strongly positively correlated with the number of input TFs of the gene. Meanwhile, short-term responses showed only weak influence from the TFN. Our results suggest that the global topological traits of the TFN of E. coli shape which gene cohorts respond to genome-wide stresses. Collection and data generation are described in the Methods sections of the main manuscript. Data processing methods and the software packages used are described in the Methods sections of the main manuscript.
A curated repository of more than 206000 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae, based on more than 1300 bibliographic references. It also includes the description of 326 specific DNA binding sites shared among 113 characterized TFs. Further information about each Yeast gene has been extracted from the Saccharomyces Genome Database (SGD). For each gene the associated Gene Ontology (GO) terms and their hierarchy in GO was obtained from the GO consortium. Currently, YEASTRACT maintains a total of 7130 terms from GO. The nucleotide sequences of the promoter and coding regions for Yeast genes were obtained from Regulatory Sequence Analysis Tools (RSAT). All the information in YEASTRACT is updated regularly to match the latest data from SGD, GO consortium, RSA Tools and recent literature on yeast regulatory networks. YEASTRACT includes DISCOVERER, a set of tools that can be used to identify complex motifs found to be over-represented in the promoter regions of co-regulated genes. DISCOVERER is based on the MUSA algorithm. These algorithms take as input a list of genes and identify over-represented motifs, which can then be compared with transcription factor binding sites described in the YEASTRACT database.
Cis-regulatory sequences are not always conserved across species. Divergence within cis-regulatory sequences may result from the evolution of species-specific patterns of gene expression or the flexible nature of the cis-regulatory code. The identification of functional divergence in cis-regulatory sequences is therefore important for both understanding the role of gene regulation in evolution and annotating regulatory elements. We have developed an evolutionary model to detect the loss of constraint on individual transcription factor binding sites (TFBSs). We find that a significant fraction of functionally constrained binding sites have been lost in a lineage-specific manner among three closely related yeast species. Binding site loss has previously been explained by turnover, where the concurrent gain and loss of a binding site maintains gene regulation. We estimate that nearly half of all loss events cannot be explained by binding site turnover. Recreating the mutations that led to binding site loss confirms that these sequence changes affect gene expression in some cases. We also estimate that there is a high rate of binding site gain, as more than half of experimentally identified S. cerevisiae binding sites are not conserved across species. The frequent gain and loss of TFBSs implies that cis-regulatory sequences are labile and, in the absence of turnover, may contribute to species-specific patterns of gene expression.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ratio of the number of predicted cooperative TF pairs (PCTFPs) to the number of TF pairs under study for each of the 17 existing algorithms in the literature.
The DBD (transcription factor database) provides genome-wide transcription factor predictions for organisms across the tree of life. The prediction method identifies sequence-specific DNA-binding transcription factors through homology using profile hidden Markov models (HMMs) of domains from Pfam and SUPERFAMILY. It does not include basal transcription factors or chromatin-associated proteins.