Facebook
TwitterThe Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.
Facebook
TwitterCollection of eukaryotic promoters derived from published articles. Annotated non-redundant collection of eukaryotic POL II promoters, for which transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.
Facebook
TwitterA promoter database of Saccharomyces cerevisiae. Users can explore the promoter regions of ~6000 genes and ORFs in yeast genome, annotate putative regulatory sites of all genes and ORFs, locate intergenic regions, and retrieve sequence of the promoter region. In regards to regulatory elements and transcription factors, users can provide information on transcriptionally related genes, browse matrix and consensus sequences, view the correlation between elements, observe binding affinity and expression, and look at genomewise distribution. SCPD also provides some simple but useful tools for promoter sequence analysis. Gene, consensus and matrix records may be submitted.
Facebook
TwitterDBTGR provides information on tunicate gene regulation, such as the location of expression, or the identified regulatory elements present in promoter sequences. The database also contains the promoters of homologous genes in multiple species to allow identification of conserved cis elements.
Facebook
TwitterPublic database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. For each gene, TFBSs conserved in orthologous sequences from at least two different species must be available. Promoter sequences as well as the original GenBank or RefSeq entries are additionally supplied in case of future identification conflicts. The final TSS annotation has been refined using the database dbTSS. Up to this release, 500 bps upstream the annotated transcription start site (TSS) according to REFSEQ annotations have been always extracted to form the collection of promoter sequences from human, mouse, rat and chicken. For each regulatory site, the position, the motif and the sequence in which the site is present are available in a simple format. Cross-references to EntrezGene, PubMed and RefSeq are also provided for each annotation. Apart from the experimental promoter annotations, predictions by popular collections of weight matrices are also provided for each promoter sequence. In addition, global and local alignments and graphical dotplots are also available.
Facebook
TwitterDNA sequence and relationships for promoter (promoter)
Facebook
TwitterAnnotated, non-redundant database of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s) (TSS) from various plant species. It contains 578 unrelated entries including 151, 396 and 31 promoters with experimentally verified TSS from monocot, dicot and other plants, respectively (April 2014). This DB presents the published promoter sequences with TSS(s) determined by direct experimental approaches and therefore serves as the most accurate source for development of computational promoter prediction tools.
Facebook
TwitterEngineering microorganisms into biological factories that convert renewable feedstocks into valuable materials is a major goal of synthetic biology; however, for many nonmodel organisms, we do not yet have the genetic tools, such as suites of strong promoters, necessary to effectively engineer them. In this work, we developed a computational framework that can leverage standard RNA-seq data sets to identify sets of constitutive, strongly expressed genes and predict strong promoter signals within their upstream regions. The framework was applied to a diverse collection of RNA-seq data measured for the methanotroph Methylotuvimicrobium buryatense 5GB1 and identified 25 genes that were constitutively, strongly expressed across 12 experimental conditions. For each gene, the framework predicted short (27–30 nucleotide) sequences as candidate promoters and derived −35 and −10 consensus promoter motifs (TTGACA and TATAAT, respectively) for strong expression in M. buryatense. This consensus closely matches the canonical E. coli sigma-70 motif and was found to be enriched in promoter regions of the genome. A subset of promoter predictions was experimentally validated in a XylE reporter assay, including the consensus promoter, which showed high expression. The pmoC, pqqA, and ssrA promoter predictions were additionally screened in an experiment that scrambled the −35 and −10 signal sequences, confirming that transcription initiation was disrupted when these specific regions of the predicted sequence were altered. These results indicate that the computational framework can make biologically meaningful promoter predictions and identify key pieces of regulatory systems that can serve as foundational tools for engineering diverse microorganisms for biomolecule production.
Facebook
Twitter500 nt promoter sequences for the DoOP database, chordate section, v1.4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1000 nt promoter sequences for the DoOP database, chordate section, v1.4.
Facebook
TwitterAs the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.
Facebook
TwitterDNA sequence and relationships for Gene 3 Promoter (promoter)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3000 nt promoter sequences for the DoOP database, chordate section, v1.4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A total of 1,075 RpoD holoenzyme-binding sites were identified within spacers on the entire E. coli K-12 W3110 genome. The constitutive promoters were predicted within type-A and type-B intergenic spacers (see Fig. 1A for classficiation). A total of 178 RNA polymerase RpoD holoenzyme-binding sites were identified within type-A spacers, which direct bidirectional transcription. Based on the gene orientation around these promoters, the genes and operons under the control of these promoters were estimated, that are located on either left side (left gene column) or right side (right gene column) of the respective spacers. Genes encoding transcription factors are indicated by star symbols (*) and the operons are shown in the operon columns [note that only the first and the last genes are shown for polycitronic operons]. The directions of transcription for these flanking genes are shown by arrows in column D. The map positions of left-side and right-side genes are shown in the map columns. The essential genes listed in the PEC database are underlined. The promoter sequences were predicted according to the analysis procedure described in Materials and Methods. For some spacers, multiple promoters were identified, of which the best-match promoters with the highest scores are described. The promoter sequence with complete match with the canonical promoter (see Fig. 4) is shown in bold and italic while the promoter sequence with 5-out of-6 match is shown in bold. The spacer including H-NS binding sites are shown as HNS mark in the spacer column. The numbers of hitherto identified promoters are 121 and 133 for left-ward and right-ward transcription, respectively, which correspond to 68 and 75%. Total number of genes under the control of 178 promoters were 300 for left-ward transcription, and 291 for right-ward transcription. The average numbers of genes under one promoter are 1.68 and 1.63 for left-ward and right-ward transcription, respectively. Among the total of 178 RpoD holoenyme-binding sites, 64 (36%) overlap with the H-NS-binding sites.
Facebook
TwitterDNA sequence and relationships for P-45 Promoter (promoter)
Facebook
TwitterThis dataset tracks the updates made on the dataset "PRESTA: associating promoter sequences with information on gene expression" as a repository for previous versions of the data and metadata.
Facebook
TwitterDNA sequence and relationships for pR (promoter)
Facebook
TwitterDue to the high capacity of their secretion machinery Gram-positive bacteria from the genus Bacillus are important expression hosts for the high-yield production of enzymes in industrial biotechnology. However, to date strains from only a few Bacillus species are used for enzyme production at the industrial scale. In this work, we introduce with Paenibacillus polymyxa DSM 292 a member of a different genus as a novel host for secretory protein production. The model gene cel8A from Clostridium thermocellum was chosen as an easily detectable reporter gene with industrial relevance to demonstrate efficient heterologous expression and secretion in P. polymyxa. The yield of the secreted cellulase Cel8A was increased by optimizing the expression medium and testing various promoter sequences on the expression plasmid pBACOV. To identify promising new promoter sequences from the genome of P. polymyxa itself, quantitative mass spectrometry was used to analyze the secretome. The most strongly secreted host proteins were identified and the promoters regulating the expression of the corresponding genes were selected. Eleven promoter sequences were cloned and tested, including well-characterized promoters from B. subtilis and B. megaterium. The best result was achieved with the promoter of the hypothetical protein PPOLYM_03468 from P. polymyxa, which in combination with the improved expression medium enabled the production of 5,475 U/l Cel8A which represents a 6.2-fold increase compared to the reference promoter PaprE. The set of promoters described in this work covers a broad range of promoter strengths useful for heterologous expression in the new host P. polymyxa.
Facebook
TwitterIntroductionThe APETALA2/ethylene response factor (AP2/ERF) superfamily plays a significant role in regulating plant gene expression in response to growth and development. To date, there have been no studies into whether the ramie AP2/ERF genes are involved in the regulation of flower development.MethodsHere, 84 BnAP2/ERF members were identified from the ramie genome database, and various bioinformatics data on the AP2/ERF gene family, structure, replication, promoters and regulatory networks were analysed. BnAP2-12 was transferred into Arabidopsis through the flower-dipping method.ResultsPhylogenetic analysis classified the 84 BnAP2/ERF members into four subfamilies: AP2 (18), RAV (3), ERF (42), and DREB (21). The functional domain analysis of genes revealed 10 conserved motifs. Genetic mapping localised the 84 members on 14 chromosomes, among which chromosomes 1, 3, 5, and 8 had more members. Collinearity analysis revealed that 43.37% possibly resulted from replication events during the evolution of the ramie genome. Promoter sequence analysis identified classified cis-acting elements associated with plant growth and development, and responses to stress, hormones, and light. Transcriptomic comparison identified 3,635 differentially expressed genes (DEGs) between male and female flowers (1,803 and 1,832 upregulated and downregulated genes, respectively). Kyoto Encyclopaedia of Genes and Genomes pathway analysis categorised DEGs involved in metabolic pathways and biosynthesis of secondary metabolites. Gene Ontology enrichment analysis further identified enriched genes associated with pollen and female gamete formations. Of the 84 BnAP2/ERFs genes, 22 and 8 upregulated and downregulated genes, respectively, were present in female flowers. Co-expression network analysis identified AP2/ERF members associated with flower development, including BnAP2-12. Subcellular localisation analysis showed that the BnAP2-12 protein is localised in the nucleus and cell membrane. Overexpression BnAP2-12 delayed the flowering time of Arabidopsis thaliana.ConclusionThese findings provide insights into the mechanism of ramie flower development.
Facebook
TwitterEscherichia coli uses s factors to quickly control large gene cohorts during stress conditions. While most of its genes respond to a single s factor, approximately 5% of them have dual s factor preference. The most common are those responsive to both s70, which controls housekeeping genes, and s38, which activates genes during stationary growth and stresses. Using RNA-seq and flow-cytometry measurements, we show that ‘σ70+38 genes’ are nearly as upregulated in stationary growth as ‘σ38 genes’. Moreover, we find a clear quantitative relationship between their promoter sequence and their response strength to changes in σ38 levels. We then propose and validate a sequence dependent model of σ70+38 genes, with dual sensitivity to s38 and s70, that is applicable in the exponential and stationary growth phases, as well in the transient period in between. We further propose a general model, applicable to other stresses and σ factor combinations. Given this, promoters controlling σ70+38 genes (and variants) could become important building blocks of synthetic circuits with predictable, sequence-dependent sensitivity to transitions between the exponential and stationary growth phases.
Facebook
TwitterThe Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.