The Peptide Sequence Database contains putative peptide sequences from human, mouse, rat, and zebrafish. Compressed to eliminate redundancy, these are about 40 fold smaller than a brute force enumeration. Current and old releases are available for download. Each species'' peptide sequence database comprises peptide sequence data from releveant species specific UniGene and IPI clusters, plus all sequences from their consituent EST, mRNA and protein sequence databases, namely RefSeq proteins and mRNAs, UniProt''s SwissProt and TrEMBL, GenBank mRNA, ESTs, and high-throughput cDNAs, HInv-DB, VEGA, EMBL, IPI protein sequences, plus the enumeration of all combinations of UniProt sequence variants, Met loss PTM, and signal peptide cleavages. The README file contains some information about the non amino-acid symbols O (digest site corresponding to a protein N- or C-terminus) and J (no digest sequence join) used in these peptide sequence databases and information about how to configure various search engines to use them. Some search engines handle (very) long sequences badly and in some cases must be patched to use these peptide sequence databases. All search engines supported by the PepArML meta-search engine can (or can be patched to) successfully search these peptide sequence databases.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Endogenous peptides are an abundant and versatile class of biomolecules with vital roles pertinent to the functionality of the nervous, endocrine, and immune systems and others. Mass spectrometry stands as a premier technique for identifying endogenous peptides, yet the field still faces challenges due to the lack of optimized computational resources for reliable raw mass spectra analysis and interpretation. Current database searching programs can exhibit discrepancies due to the unique properties of endogenous peptides, which typically require specialized search considerations. Herein, we present a high throughput, novel scoring algorithm for the extraction and ranking of conserved amino acid sequence motifs within any endogenous peptide database. Motifs are conserved patterns across organisms, representing sequence moieties crucial for biological functions, including maintenance of homeostasis. MotifQuest, our novel motif database generation algorithm, is designed to work in partnership with EndoGenius, a program optimized for database searching of endogenous peptides and that is powered by a motif database to capitalize on biological context to produce identifications. MotifQuest aims to quickly develop motif databases without any prior knowledge, a laborious task not possible with traditional sequence alignment resources. In this work we illustrate the utility of MotifQuest to expand EndoGenius’ identification utility to other endogenous peptides by showcasing its ability to identify antimicrobial peptides. Additionally, we discuss the potential utility of MotifQuest to parse out motifs from a FASTA database file that can be further validated as new peptide drug candidates.
THIS RESOURCE IS NO LONGER IN SERVICE, documented on June 04, 2014. Curated database on selected from randomized pools proteins and peptides designed for accumulation of experimental data on protein functionality obtained by in vitro directed evolution methods (phage display, ribosome display, SIP etc.) ASPD is integrated by means of hyperlinks with different databases (SWISS-PROT, PDB, PROSITE, etc). The database also contains modules for pairwise correlation analysis and BLAST search.
NIST peptide libraries are comprehensive, annotated mass spectral reference collections from various organisms and proteins useful for the rapid matching and identification of acquired MS/MS spectra. Spectra were produced by tandem mass spectrometers using liquid chromatographic separations followed by electrospray ionization. Unlike the NIST small molecule electron ionization library which contains one spectrum per molecular structure, there are several different modes of fragmentation (ion trap and ?beam-type? collision cells are currently the most commonly used fragmentation devices) that result in spectra with different, energy dependent, patterns. These result in multiple spectral libraries, distinguished by ionization mode, each of which may contain several spectra per peptide. Different libraries have also been assembled for iTRAQ-4 derivatized peptides and for phosphorylated peptides. Separating libraries by animal species reduces search time, although investigators may elect to include several species in their searches.
Norine is a database dedicated to nonribosomal peptides (NRPs). In bacteria and fungi, in addition to the traditional ribosomal proteic biosynthesis, an alternative ribosome-independent pathway called NRP synthesis allows peptide production. The molecules synthesized by NRPS contain a high proportion of nonproteogenic amino acids whose primary structure is not always linear, often being more complex and containing cycles and branchings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Plants produce a wide range of bioactive peptides as part of their innate defense mechanisms. With the explosive growth of plant-derived peptides, verifying the therapeutic function using traditional experimental methods are resources and time consuming. Therefore, it is necessary to predict the therapeutic function of plant-derived peptides more effectively and accurately with reduced waste of resources and thus expedite the development of plant peptides. We herein developed a repository of plant peptides predicted to have multiple therapeutic functions, named as MFPPDB (multi-functional plant peptide database). MFPPDB including 1,482,409 single or multiple functional plant origin therapeutic peptides derived from 121 fundamental plant species. The functional categories of these therapeutic peptides include 41 different features such as anti-bacterial, anti-fungal, anti-HIV, anti-viral, and anti-cancer. The detailed physicochemical information of these peptides was presented in functional search and physicochemical property search module, which can help users easily access the peptide information by the plant peptide species, ID, and functions, or by their peptide ID, isoelectric point, peptide sequence, and molecular weight through web-friendly interface. We further matched the predicted peptides to nine state-of-the-art curated functional peptide databases and found that at least 293,408 of the peptides possess functional potentials. Overall, MFPPDB integrated a massive number of plant peptides have single or multiple therapeutic functions, which will facilitate the comprehensive research in plant peptidomics. MFPPDB can be freely accessed through http://124.223.195.214:9188/mfppdb/index.
A database of peptides based on sequence text mining and public peptide data sources. Only peptides that are 20 amino acids or shorter are stored. Only peptides with available sequences are stored. After submitting a query you can further refine the results using the new heat map retrieval tool to quickly find the entries that are most relevant to you. Text classification helps you find candidate peptides that are related to cancer, cardiovascular diseases, diabetes, apoptosis, angiogenesis and molecular imaging or peptides for which binding data exist.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OASis human 9-mer peptide database, generated from 118 million human antibody sequences from the Observed Antibody Space database.
Attached is a gzipped SQLite database containing two tables: "peptides" and "subjects".
Links:
BioPhi codebase and documentation: https://github.com/Merck/BioPhi
Public BioPhi server: https://biophi.dichlab.org
OAS Database: http://opig.stats.ox.ac.uk/webapps/oas/
The MHC-Peptide Interaction Database version T (MPID-T) is a new generation database for sequence-structure-function information on T cell receptor/peptide/MHC interactions. It contains all structures of TcR/pMHC and pMHC complexes, with emphasis on the structural characterization of these complexes. MPID-T will facilitate the development of algorithms to predict whether a peptide sequence will bind to a specific MHC allele. The database has been populated with the data from the Protein Data Bank(PDB). The data from the PDB is manually verified and classified, after which each structure is analysed for atomic interactions relevant to MHC-Peptide complex.
The antimicrobial peptide database (APD) provides information on anticancer, antiviral, antifungal and antibacterial peptides.
SYFPEITHI is a database comprising more than 7000 peptide sequences known to bind class I and class II MHC molecules. The entries are compiled from published reports only. It contains a collection of MHC class I and class II ligands and peptide motifs of humans and other species, such as apes, cattle, chicken, and mouse, for example, and is continuously updated. Searches for MHC alleles, MHC motifs, natural ligands, T-cell epitopes, source proteins/organisms and references are possible. Hyperlinks to the EMBL and PubMed databases are included. In addition, ligand predictions are available for a number of MHC allelic products. The database is based on previous publications on T-cell epitopes and MHC ligands. It contains information on: -Peptide sequences -anchor positions -MHC specificity -source proteins, source organisms -publication references Since the number of motifs continuously increases, it was necessary to set up a database which facilitates the search for peptides and allows the prediction of T-cell epitopes. The prediction is based on published motifs (pool sequencing, natural ligands) and takes into consideration the amino acids in the anchor and auxiliary anchor positions, as well as other frequent amino acids. The score is calculated according to the following rules: The amino acids of a certain peptide are given a specific value depending on whether they are anchor, auxiliary anchor or preferred residue. Ideal anchors will be given 10 points, unusual anchors 6-8 points, auxiliary anchors 4-6 and preferred residues 1-4 points. Amino acids that are regarded as having a negative effect on the binding ability are given values between -1 and -3. Sponsors: SYFPEITHI is supported by DFG-Sonderforschungsbereich 685 and theEuropean Union: EU BIOMED CT95-1627, BIOTECH CT95-0263, and EU QLQ-CT-1999-00713.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To enable the identification of mutated peptide sequences in complex biological samples, in this work, a cancer protein database with mutation information collected from several public resources such as COSMIC, IARC P53, OMIM and UniProtKB, was developed. In-house developed Perl-scripts were used to search and process the data, and to translate each gene-level mutation into a mutated peptide sequence. The cancer mutation database comprises a total of 872,125 peptide entries from 25,642 protein IDs. A description line for each entry provides the parent protein ID and name, the cDNA- and protein-level mutation site and type, the originating database, and the cancer tissue type and corresponding hits. The database is FASTA formatted to enable data retrieval by commonly used tandem MS search engines.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Leguminous crops are vital to sustainable agriculture due to their ability to fix atmospheric nitrogen, improving soil fertility and reducing the need for synthetic fertilizers. Additionally, they are an excellent source of protein for both human consumption and animal feed. AntiMicrobial Peptides (AMPs), found in various leguminous seeds, exhibit broad-spectrum antimicrobial activity through diverse mechanisms, including interaction with microbial cell membranes and interference with cellular processes, making them valuable for enhancing crop resilience and food safety. In the field of plant sciences, computational biology methods have been instrumental in the discovery and optimization of AMPs. These methods enable rapid exploration of sequence space and the prediction of AMPs using deep learning technologies. Optimizing AMP annotations through computational design offers a strategic approach to enhance efficacy and minimize potential side effects, providing a viable alternative to conventional antimicrobial agents. However, the presence of overlapping sequences across multiple databases poses a challenge for creating a reliable dataset for AMP prediction. To address this, we conducted a comprehensive analysis of sequence redundancy across various AMP databases. These databases encompass a wide range of AMPs from different sources and with specific functions, including both naturally occurring and artificially synthesized AMPs. Our analysis revealed significant overlap, underscoring the need for a non-redundant AMP sequence database. We present the development of a new database that consolidates unique AMP sequences derived from leguminous seeds, aiming to create a more refined dataset for the binary classification and prediction of plant-derived AMPs. This database will support the advancement of sustainable agricultural practices by enhancing the use of plant-based AMPs in agroecology, contributing to improved crop protection and food security.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bottom-up proteomics approaches rely on database searches that compare experimental values of peptides to theoretical values derived from protein sequences in a database. While the human body can produce millions of distinct antibodies, current databases for human antibodies such as UniProtKB are limited to only 1095 sequences (as of 2024 January). This limitation may hinder the identification of new antibodies using bottom-up proteomics. Therefore, extending the databases is an important task for discovering new antibodies.
Herein, we adopted extensive collection of antibody sequences from Observed Antibody Space for conducting efficient database searches in publicly available proteomics data with a focus on the SARS-CoV-2 disease. Thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the Observed Antibody Space were in silico digested to obtain 18 million unique peptides. These peptides were then used to create six databases (DB1-DB6) for bottom-up proteomics. We used those databases for searching antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE), and we consistently found new antibody peptides in those samples. The database searching task was done by using Fragpipe softwares.
Table 1. Information of databases. In addition to human SARS-CoV-2 antibody peptides, every database also contains human protein sequences from UniProt database and contaminants from cRAP database.
File | Database | Number of human SARS-CoV-2 antibody peptides | Number of covered antibodies |
DB1.fasta | DB1 | 100 | 1.28E7 |
DB2.fasta | DB2 | 1E3 | 1.93E7 |
DB3.fasta | DB3 | 1E4 | 2.40E7 |
DB4.fasta | DB4 | 1E5 | 2.66E7 |
DB5.fasta | DB5 | 1E6 | 2.83E7 |
DB6.fasta | DB6 | 1E7 | 3.01E7 |
A platform that includes a database of nonribosomal peptides together with tools for their analysis. Norine currently contains more than 1000 peptides. The name Norine stands for Nonribosomal peptides, with "ine" as a typical ending of peptide names. For each peptide, the database stores its structure as well as various annotations such as the biological activity, producing organisms, bibliographical references and others. The database can be queried in order to search for peptides through their annotations as well as through their monomeric structure. In the latter case, the user can specify either the whole structure or a structural pattern (possibly including "undefined monomers") of the searched peptide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. And peptide identification is an important step in metaproteomics research. However, the large search space in metaproteomics studies causes significant challenges for peptide identification. Here, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UDAMP is a comprehensive, human database of antimicrobial and immunomodulatory peptides containing all so far known human antimicrobial and immunomodulatory peptides.
Database of ~ 865 peptides where each record provides information on the food source, preparation, purification, reference(s) and any other additional information. The database provides a search and browsing option for a more personalized research experience.
The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are reDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of “neighbor” peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, “filter then subset-neighbor search” (FSNS), that accounts for neighbor peptides. We show evidence that FSNS properly controls the FDR when neighbors are present and that FSNS outperforms group-FDR, the only other method able to control the FDR relative to a subset of relevant peptides
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FASST Database files for use in the peptide design pipeline.
A complete list of structures provided in the databases is provided in the supplementary information of the Protein Science article.
Singlechain structures: pro4322-sup-0004-TableS6.txt (format: PDBID_CHAINID)
Multichain structures: pro4322-sup-0005-TableS7.txt (format: PDBID)
The Peptide Sequence Database contains putative peptide sequences from human, mouse, rat, and zebrafish. Compressed to eliminate redundancy, these are about 40 fold smaller than a brute force enumeration. Current and old releases are available for download. Each species'' peptide sequence database comprises peptide sequence data from releveant species specific UniGene and IPI clusters, plus all sequences from their consituent EST, mRNA and protein sequence databases, namely RefSeq proteins and mRNAs, UniProt''s SwissProt and TrEMBL, GenBank mRNA, ESTs, and high-throughput cDNAs, HInv-DB, VEGA, EMBL, IPI protein sequences, plus the enumeration of all combinations of UniProt sequence variants, Met loss PTM, and signal peptide cleavages. The README file contains some information about the non amino-acid symbols O (digest site corresponding to a protein N- or C-terminus) and J (no digest sequence join) used in these peptide sequence databases and information about how to configure various search engines to use them. Some search engines handle (very) long sequences badly and in some cases must be patched to use these peptide sequence databases. All search engines supported by the PepArML meta-search engine can (or can be patched to) successfully search these peptide sequence databases.