Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BLASTP vs TrEMBL
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted isoelectric point for all UniProtKB/TrEMBL proteins (April 2016) done using 18 different algorithms. Over 63 millions of protein sequences. Compressed using 7zip **Primary reference: Kozlowski, LP (2016) Proteome-pI: proteome isoelectric point database. Nucleic Acids Research doi: 10.1093/nar/gkw978 **www: http://isoelectricpointdb.org
Facebook
TwitterResults of the blastx search of Adiantum capillus-veneris EST (AcEST) sequences against the UniProtKB/TrEMBL (release 39.9) database. The alignment information of each hit in the Blastx hit list is provided on a single line. CSV format text file.
Facebook
TwitterCentral repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annotation Diamond database trEMBL Aedes zammitii
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The viral subset of the TrEMBL database clustered at 95% identity at the amino acid level to remove redundancy.
Facebook
TwitterSYSTERS is a database of protein sequences grouped into homologous families and superfamilies. The SYSTERS project aims to provide a meaningful partitioning of the whole protein sequence space by a fully automatic procedure. A refined two-step algorithm assigns each protein to a family and a superfamily. The sequence data underlying SYSTERS release 4 now comprise several protein sequence databases derived from completely sequenced genomes (ENSEMBL, TAIR, SGD and GeneDB), in addition to the comprehensive Swiss-Prot/TrEMBL databases. To augment the automatically derived results, information from external databases like Pfam and Gene Ontology are added to the web server. Furthermore, users can retrieve pre-processed analyses of families like multiple alignments and phylogenetic trees. New query options comprise a batch retrieval tool for functional inference about families based on automatic keyword extraction from sequence annotations. A new access point, PhyloMatrix, allows the retrieval of phylogenetic profiles of SYSTERS families across organisms with completely sequenced genomes. Gene, Human, Vertebrate, Genome, Human ORFs
Facebook
TwitterThe Peptide Sequence Database contains putative peptide sequences from human, mouse, rat, and zebrafish. Compressed to eliminate redundancy, these are about 40 fold smaller than a brute force enumeration. Current and old releases are available for download. Each species'' peptide sequence database comprises peptide sequence data from releveant species specific UniGene and IPI clusters, plus all sequences from their consituent EST, mRNA and protein sequence databases, namely RefSeq proteins and mRNAs, UniProt''s SwissProt and TrEMBL, GenBank mRNA, ESTs, and high-throughput cDNAs, HInv-DB, VEGA, EMBL, IPI protein sequences, plus the enumeration of all combinations of UniProt sequence variants, Met loss PTM, and signal peptide cleavages. The README file contains some information about the non amino-acid symbols O (digest site corresponding to a protein N- or C-terminus) and J (no digest sequence join) used in these peptide sequence databases and information about how to configure various search engines to use them. Some search engines handle (very) long sequences badly and in some cases must be patched to use these peptide sequence databases. All search engines supported by the PepArML meta-search engine can (or can be patched to) successfully search these peptide sequence databases.
Facebook
TwitterE-value distribution of the BLASTx hits against the Nr and TrEMBL databases for each unigene.
Facebook
TwitterA database of homologous invertebrate genes, structured under ACNUC sequence database management system. It allows one to select sets of homologous genes among invertebrate species, and to visualize multiple alignments and phylogenetic trees. The database itself contains all invertebrate protein sequences from UniProt (SWISS-PROT+TrEMBL), with some data corrected, clarified or completed (notably to address the problem of redundancy and orthology/paralogy) and with some annotation modifications. It contains also all the corresponding nucleotide sequences in EMBL. Homologous proteins are classified into families and multiple alignments and phylogenetic trees are computed for each family. Sequences and related information have been structured in an ACNUC database. Thus, HOINVGEN is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generally, HOINVGEN gives an overall view of what is known about a peculiar gene family.
Facebook
TwitterMammalian protein-protein interaction database focusing on synaptic proteins. The Protein-Protein Interaction Database was originally a single-person's attempt to integrate a gamut of biological/bibliographical/molecular data and build a framework which might help understanding how cells orchestrate their protein content in order to become what they are: machines with a purpose. This is based on the simple paradigm that functionality like signal cascades are held together in a close space, thereby allowing specific events to occur without the necessity of passive diffusion and random events. The PPID database arose from the need to interpret Proteomic datasets, which were generated analysing the NMDA-receptor complex (see H. Husi, M. A. Ward, J. S. Choudhary, W. P. Blackstock and S. G. Grant (2000). Proteomic analysis of NMDA receptor-adhesion protein signaling complexes. Nat Neurosci 3, 661-669.). To study these clusters of proteins requires unavoidably the handling of large datasets, which PPID is generally aimed and tailored for. This database is unifying molecular entries across three species, namely human, rat and mouse and is is footed on sequence databases such as SwissProt, EMBL, TrEMBL (translated EMBL sequences) and Unigene and the literature database PubMed. A typical entry in PPID holds up to three general entries for the three species, all protein and gene accession numbers associated with them (assembled from Blast2 searches of the databases) and the OMIM entry as maintained by Johns Hopkins University. Furthermore protein sequence information is also included, together with known and novel splice-variants of each molecule as found by ClustalW sequence alignments. Entry points also include protein-binding information together with the literature reference. The whole database is curated manually to insure accuracy and quality. Querying the database will be possible by online browsing and batch-submission for large datasets holding accession number information, as can be generated using software like Mascot for mass-spectrometry. Cluster-analysis of the submitted datasets in the form of a graphical output will be developed as well as an easy-to-use web-interface. An interface is currently being built in collaboration with the Department of Informatics (T. Theodosiou and D. Armstrong) and will be deployed soon The current team of people collating and deploying the database are H. Husi (database mining and information gathering) and T. Theodosiou (web-interface and deployment). Please note that this database is not funded financially, and cannot survive without sponsorship.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a database for feature representation of ESM2, which includes Swiss data, Swiss normalized data, original TrEMBL data, original TrEMBL normalized data, non-homology TrEMBL data and Table S10.Non-homologous TrEMBL normalized data can be created by extracting Entry ID from the non-homologous TrEMBL data and then extracting the corresponding feature representation from the original TrEMBL normalized data.Figure S4 (eos) and Figure S5 (eos) are supplement for the Histogram plots and Scatter plots of feature eos in corresponding Figure S4 and Figure S5.Figure S6 and Figure S8 are the results of GO annotation enrichment. The GO gene set is a grouped protein dataset used for GO annotation enrichment.Figure S7 is a silhouette score plot.For specific usage of the dataset, please refer to Github.The RF_model files are pickle files for different RF models, which can be used for dataset inference and interpretable analysis. Among these models, the AA_count model and feature_all model have more complex feature inputs. Therefore, we provide the Swiss training dataset as a reference for feature arrangement. The feature order for other models is simply from 0 to 1279.
Facebook
TwitterIPI provides a top level guide to the main databases (UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, RefSeq, Ensembl, TAIR, H-InvDB, Vega) that describe the proteomes of higher eukaryotic organisms. IPI: :1. effectively maintains a database of cross references between the primary data sources :2. provides minimally redundant yet maximally complete sets of proteins for featured species (one sequence per transcript) :3. maintains stable identifiers (with incremental versioning) to allow the tracking of sequences in IPI between IPI releases. IPI is updated monthly in accordance with the latest data released by the primary data sources. As previously announced, the closure of IPI has been proposed for some time. Replacement data sets are now available through UniProt for human and mouse; sets for the other species contained within IPI are expected to be included as part of the UniProt release 2011_07. To allow users time to transition to using the new UniProt data sets, IPI releases will continue to be produced throughout the summer. The final release will be made in September 2011. Thereafter, the IPI website will cease to be maintained, although previous releases of the dataset will continue to be available from the FTP site. We would like to thank our users for their support and interest in this service.
Facebook
TwitterSummary statistics for the number of protein-coding gene predictions for ‘Anitra’, ‘Autumn Bliss’ and ‘Malling Jewel’ that returned ≥1 positive hit after the BlastP analysis with nr, Araport11, RefSeq, SwissProt and TrEMBL databases as subjects, along with the number of protein-coding gene regions assigned Interpro, GO, KEGG orthology and KEGG pathway terms.
Facebook
TwitterThe ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE. It is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include format and content enhancements, cross-references to additional databases, new documentation files and improvements to TrEMBL, a computer-annotated supplement to SWISS-PROT.
Facebook
Twitter1The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for bacteria.2The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for bacteria.3The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).4The Gram-negative bacterial benchmark dataset found in PSORTb3.0 [23], denoted PS30GN, includes 8029 protein sequences in five subcellular categories: extracellular, outer membrane, periplasmic, inner membrane, and cytoplasmic.
Facebook
TwitterLC-MS/MS results were used to interrogate Swiss prot/Trembl database (MSdb) and Biomphalaria glabrata ESTs database (Bg-dbEST).A protein was considered to be correctly identified if at least two peptides were confidently matched with a score greater than 100.ID: identified, C: compatible combination, IC: incompatible combination.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented on July 16, 2013. A relational database that contains all the eukaryotic protein-encoding DNA sequences in GenBank. It provides detailed and comprehensive features about both the intron containing and the intron-less genes. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein coding regions, Xpro provides annotations on the splice sites motifs and intron phases. Furthermore, Xpro validates intron positions using alignment information between the records sequence and EST sequences found in dbEST. The entries in the XPro are also cross-referenced to SWISS-PROT/TrEMBL and Pfam databases. Unprecedented growth data in GenBank, the primary repository of nucleotide sequences due to the ever increasing number of genome and EST sequencing projects and the poor annotation of exon/intron details required for molecular evolution studies in the primary nucleotide database have made development of Xpro database. It is a specialized database that contains details about genomic features specific to eukaryotic genes and provides various web tools for analyzing/visualizing these features., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Facebook
Twitter1The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for archaea.2The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for archaea.3The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).4The archaeal benchmark dataset found in PSORTb3.0 [23], denoted PS30Arch, includes 805 protein sequences in four subcellular categories: extracellular, cell wall, membrane, and cytoplasmic.
Facebook
Twitter1The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for bacteria.2The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for bacteria.3The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).4The proteomic sequence data is that of the newly documented Pseudomonas aeruginosa PA01 dataset [31], which contains hypothetical and uncharacterized proteins.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BLASTP vs TrEMBL