Facebook
TwitterAAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices. Another important feature of amino acids that can be represented numerically is the similarity between amino acids. Thus, a similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches. The AAindex2 section of the Amino Acid Index Database is a collection of published amino acid mutation matrices together with the result of cluster analysis. This section currently contains 94 matrices. In the release 9.0, we added a collection of published protein pairwise contact potentials to AAindex as AAindex3. This section currently contains 47 contact potential matrices. Sponsors: This work was supported by grants and resources from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency, and the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Facebook
TwitterA database of peptides based on sequence text mining and public peptide data sources. Only peptides that are 20 amino acids or shorter are stored. Only peptides with available sequences are stored. After submitting a query you can further refine the results using the new heat map retrieval tool to quickly find the entries that are most relevant to you. Text classification helps you find candidate peptides that are related to cancer, cardiovascular diseases, diabetes, apoptosis, angiogenesis and molecular imaging or peptides for which binding data exist.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
Facebook
TwitterThe Peptide Sequence Database contains putative peptide sequences from human, mouse, rat, and zebrafish. Compressed to eliminate redundancy, these are about 40 fold smaller than a brute force enumeration. Current and old releases are available for download. Each species'' peptide sequence database comprises peptide sequence data from releveant species specific UniGene and IPI clusters, plus all sequences from their consituent EST, mRNA and protein sequence databases, namely RefSeq proteins and mRNAs, UniProt''s SwissProt and TrEMBL, GenBank mRNA, ESTs, and high-throughput cDNAs, HInv-DB, VEGA, EMBL, IPI protein sequences, plus the enumeration of all combinations of UniProt sequence variants, Met loss PTM, and signal peptide cleavages. The README file contains some information about the non amino-acid symbols O (digest site corresponding to a protein N- or C-terminus) and J (no digest sequence join) used in these peptide sequence databases and information about how to configure various search engines to use them. Some search engines handle (very) long sequences badly and in some cases must be patched to use these peptide sequence databases. All search engines supported by the PepArML meta-search engine can (or can be patched to) successfully search these peptide sequence databases.
Facebook
TwitterThe AARSs database is the collection of amino acid sequences of all published AARSs. Currently it contains 1047 primary structures of cytoplasmic and organellar AARSs from various organisms. The entries are grouped according to AARS amino acid specificity. They are based on EMBL/SWISS-PROT format. Each includes the AARS amino acid sequence, its SWISS-PROT name and the accession number, a short description of the sequence, its source (organism name with taxonomic classification) and bibliographic information. For the enzymes whose sequences were determined at the nucleotide level, the appropriate EMBL/GenBank or TIGR entries are included, and for those with already known 3D structure, the cross-references to the Brookhaven Protein Data Base are indicated. The partial sequences of AARSs are also included in the database. According to the original SWISS-PROT description, some of the entries have been marked as putative or probable.
Facebook
TwitterDatabases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
Facebook
TwitterConformation Angles DataBase is a comprehensive, authoritative and timely knowledge base developed to facilitate retrieval of information related to the conformational angles (main-chain and side-chain) of the amino acid residues present in the non-redundant (both 25% and 90%) data set. The database includes the options of determining the dependency of the conformation angles of a particular residue upon the flanking residues in main-chain, doublet analysis, triplet analysis and analysis of a particular protein structure. It is worth mentioning that for all the options, a user-friendly and convenient Java Graphical User Interface (GUI) has been provided to display the output on the client machine.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Facebook
TwitterDatabase of annotations of functional units in proteins including multiple sequence alignment models for ancient domains and full-length proteins. This collection of models includes 3D structures that display the sequence/structure/function relationships in proteins. It also includes alignments of the domains to known three-dimensional protein structures in the MMDB database. The source databases are Pfam, Smart, and COG. Users can identify amino acids in protein sequences with the resources available as well as view single sequences embedded within multiple sequence alignments.
Facebook
TwitterPROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This work presents the ribosomal protein database of Listeria floridensis. Original data for the work came from the annotated proteome data of the bacterium downloaded from UniProt. Using an in-house MATLAB ribosomal protein database analysis software, the original proteome data file was parsed to extract protein name and amino acid sequence of all ribosomal proteins in the species. The database also includes calculated variables such as number of residues, molecular weight, and nucleotide sequence. Overall, the presented database could serve as a ribosomal protein mass fingerprint for use in microbial identification, or it could be used in fundamental studies seeking to uncover new insights into ribosomal protein biology.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 4,2023.The Human Gene and Protein Database presents SDS-PAGE patterns and other informations of human genes and proteins. The HGPD was constructed from full-length cDNAs. For conversion to Gateway entry clones, we first determined an open reading frame (ORF) region in each cDNA meeting the criteria. Those ORF regions were PCR-amplified utilizing selected resource cDNAs as templates. All the details of the construction and utilization of entry clones will be published elsewhere. Amino acid and nucleotide sequences of an ORF for each cDNA and sequence differences of Gateway entry clones from source cDNAs are presented in the GW: Gateway Summary window. Utilizing those clones with a very efficient cell-free protein synthesis system featuring wheat germ, we have produced a large number of human proteins in vitro. Expressed proteins were detected in almost all cases. Proteins in both total and supernatant fractions are shown in the PE: Protein Expression window. In addition, we have also successfully expressed proteins in HeLa cells and determined subcellular localizations of human proteins. These biological data are presented on the frame of cDNA clusters in the Human Gene and Protein Database. To build the basic frame of HGPD, sequences of FLJ full-length cDNAs and others deposited in public databases (Human ESTs, RefSeq, Ensembl, MGC, etc.) are assembled onto the genome sequences (NCBI Build 35 (UCSC hg17)). The majority of analysis data for cDNA sequences in HGPD are shared with the FLJ Human cDNA Database (http://flj.hinv.jp/) constructed as a human cDNA sequence analysis database focusing on mRNA varieties caused by variations in transcription start site (TSS) and splicing.
Facebook
TwitterThis work presents the ribosomal protein database of Acinetobacter baumannii. Original data for the work came from the annotated proteome data of the bacterium downloaded from UniProt. Using an in-house MATLAB ribosomal protein database analysis software, the original proteome data file was parsed to extract protein name and amino acid sequence of all ribosomal proteins in the species. The database also includes calculated variables such as number of residues, molecular weight, and nucleotide sequence. Overall, the presented database could serve as a ribosomal protein mass fingerprint for use in microbial identification, or it could be used in fundamental studies seeking to uncover new insights into ribosomal protein biology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
Facebook
TwitterThe Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: i) organism name, ii) metabolic pathways, as defined by the KEGG database, iii) gene orthology, as defined by the COG database, iv) conserved protein motifs, as defined by the Pfam database, v) reference gene, vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient protocol to select the more representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool (GeConT) to visualize their genomic context and retrieve the sequence of their corresponding 5�� regulatory regions, as well as the nucleotide or amino acid sequences of their genes. The prediction algorithm The algorithm is a multilayer perceptron neural network (MLP) classifier, that used as input the intergenic distances of contiguous genes and the functional relationship scores of the STRING database between the different groups of orthologous proteins, as defined in the COG database. Nevertheless, the operon prediction of our method is not restricted to only those genes with a COG assignation, since we successfully defined new groups of orthologous genes and obtained, by extrapolation, a set of equivalent STRING-like scores based on conserved gene pairs on different genomes. Since the STRING functional relationships scores are determined in an un-bias manner and efficiently integrates a large amount of information coming from different sources and kind of evidences, the prediction made by our MLP are considerably less influenced by the bias imposed in the training procedure using one specific organism.
Facebook
TwitterAnalysis of codon usage at integrase amino acid position 97 among patient-derived integrase sequences queried from the Los Alamos HIV database.
Facebook
TwitterIt provides information on natural and artificial mutants, including random and site-directed ones, for all proteins except members of the globin and immunoglobulin families. The PMD is based on literature, and each entry in the database corresponds to one article which may describe one, several or a number of protein mutants. Each database entry is identified by a serial number and is defined as either natural or artificial, depending on the type of the mutation. For each entry the following are recorded : JOURNAL, TITLE, CROSS-REFERENCE, PROTEIN, N-TERMINAL, CHANGE, FUNCTION, STRUCTURE, STABILITY, etc. CROSS-REFERENCE indicates the code names of the protein given in other databases such as Protein Identification Resources (2). N-TERMINAL shows the N-terminal sequence of five amino acids which may help to show the unambiguous numbering of th e sequence. CHANGE indicates the position and kind of mutations, such as amino acid substitution, insertion and deletion, denoted with a specific notation. Any functional or structural features (FUNCTION, STRUCTURE, STABILITY,etc) observed in the mutant are described immediately after ''CHANGE''. Relative differences in activity and/or stability, in comparison with the wild-type protein, are indicated with symbols (- -),(-),(=),(+) or (+ +). Complete loss of activity is denoted as (0). Data Submission A data submission system was newly prepared in the PMD. We welcome the authors of articles published in academic journals to submit their own mutant data to the PMD. After checking the contents, we will register the data with a unique accession number.
Facebook
TwitterA database for protein-nucleic acid interaction that provides various features of protein-nucleic acid interfaces. There are 2333 protein-nucleic acid PDB complexes, 9547 SCOP domains, and 9633 domain-nucleic acid interfaces in BIPA. BIPA also provides a multiple structural alignment of representative structures at the SCOP family level using the program SALIGN, and the structural alignments were further annotated using the program JOY to detect local environments of amino acids.
Facebook
Twitterhttps://www.aatbio.com/tou.htmlhttps://www.aatbio.com/tou.html
Amino acids are the building blocks that make up all proteins, polypeptides and peptides. Each amino acid consists of a central carbon, known as the α-carbon, to which an amino group (-NH2), an acidic carboxyl group (-COOH) and an organic side chain
Facebook
TwitterAAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices. Another important feature of amino acids that can be represented numerically is the similarity between amino acids. Thus, a similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches. The AAindex2 section of the Amino Acid Index Database is a collection of published amino acid mutation matrices together with the result of cluster analysis. This section currently contains 94 matrices. In the release 9.0, we added a collection of published protein pairwise contact potentials to AAindex as AAindex3. This section currently contains 47 contact potential matrices. Sponsors: This work was supported by grants and resources from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency, and the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.