Facebook
TwitterTo accelerate the process of tumor antigen discovery, we generated a publicly available Human Potential Tumor Associated Antigen database (HPtaa) with pTAAs identified by insilico computing. 3518 potential targets have been included in the database, which is freely available to academic users. It successfully screened out 41 of 82 known Cancer-Testis antigens, 6 of 18 differentiation antigen, 2 of 2 oncofetal antigen, and 7 of 12 FDA approved cancer markers that have Gene ID, therefore will provide a good platform for identification of cancer target genes. This database utilizes expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data, Unigene expression data. In addition, other relevant databases required for TAA discovery such as CGAP, CCDS, gene ontology database etc, were also incorporated. In order to integrate different expression platforms together, various strategies and algorithms have been developed. Known tumor antigens are gathered from literature and serve as training sets. A total tumor specificity penalty was computed from positive clue penalty for differential expression in human cancers, the corresponding differential ratio, and normal tissue restriction penalty for each gene. We hope this database will help with the process of cancer immunome identification, thus help with improving the diagnosis and treatment of human carcinomas.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Serum antibodies are valuable source of information on the health state of an organism. The profiles of serum antibody reactivity can be generated by using a high throughput sequencing of peptide-coding DNA from combinatorial random peptide phage display libraries selected for binding to serum antibodies. Here we demonstrate that the targets of immune response, which are recognized by serum antibodies directed against sequential epitopes, can be identified using the serum antibody repertoire profiles generated by high throughput sequencing. We developed an algorithm to filter the results of the protein database BLAST search for selected peptides to distinguish real antigens recognized by serum antibodies from irrelevant proteins retrieved randomly. When we used this algorithm to analyze serum antibodies from mice immunized with human protein, we were able to identify the protein used for immunizations among the top candidate antigens. When we analyzed human serum sample from the metastatic melanoma patient, the recombinant protein, corresponding to the top candidate from the list generated using the algorithm, was recognized by antibodies from metastatic melanoma serum on the western blot, thus confirming that the method can identify autoantigens recognized by serum antibodies. We demonstrated also that our unbiased method of looking at the repertoire of serum antibodies reveals quantitative information on the epitope composition of the targets of immune response. A method for deciphering information contained in the serum antibody repertoire profiles may help to identify autoantibodies that can be used for diagnosing and monitoring autoimmune diseases or malignancies.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented on July 15, 2013. The SV40 T antigen database lists viruses and plasmids expressing mutant forms of large T antigen. Each entry contains information regarding the mutant designation, mutant type, virus strain, nucleotide change, amino acid change and pertinent references. Category: Human Genes and Diseases Subcategory: Cancer gene databases
Facebook
TwitterIdentifiers represent antibody-antigen complexes in the Antigen-Antibody Complex Database (AACDB), which provides comprehensive structural and functional annotations including paratope and epitope information, antibody developability data, and antigen-drug target relationships to support immunoinformatics research and therapeutic antibody development.
Facebook
TwitterThis repository contains antibody/B cell and T cell epitope information and epitope prediction and analysis tools for use by the research community worldwide. Immune epitopes are defined as molecular structures recognized by specific antigen receptors of the immune system, namely antibodies, B cell receptors, and T cell receptors. Immune epitopes from infectious diseases, excluding HIV, and immune-mediated diseases and the accompanying biological information are included.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on August 23, 2019.BGMUT was database that provided publicly accessible platform for DNA sequences and curated set of blood mutation information. Data Archive are available at ftp://ftp.ncbi.nlm.nih.gov/pub/mhc/rbc/Final Archive.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer vaccines have gradually attracted attention for their tremendous preclinical and clinical performance. With the development of next-generation sequencing technologies and related algorithms, pipelines based on sequencing and machine learning methods have become mainstream in cancer antigen prediction; of particular focus are neoantigens, mutation peptides that only exist in tumor cells that lack central tolerance and have fewer side effects. The rapid prediction and filtering of neoantigen peptides are crucial to the development of neoantigen-based cancer vaccines. However, due to the lack of verified neoantigen datasets and insufficient research on the properties of neoantigens, neoantigen prediction algorithms still need to be improved. Here, we recruited verified cancer antigen peptides and collected as much relevant peptide information as possible. Then, we discussed the role of each dataset for algorithm improvement in cancer antigen research, especially neoantigen prediction. A platform, Cancer Antigens Database (CAD, http://cad.bio-it.cn/), was designed to facilitate users to perform a complete exploration of cancer antigens online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example proteins and validated epitopes present in the IEDB 3.0 database.
Facebook
TwitterEpitome is a database of structurally inferred antigenic epitopes in proteins. It includes all known antigenic residues and the antibodies that interact with them, including a detailed description of residues involved in the interaction and their sequence/structure environments. Additionally, Interactions can be visualized using an interface into Jmol. The website also contains specialized software, NLProt, to enable users to extract protein names and sequences from natural language text, and links to several other databases involved in antibody/antigen interactions. antibody/antigen interactions, antigen epitope
Facebook
TwitterThe Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.
Facebook
TwitterRepository of sequenced antibodies, integrating curated information about antibody and its antigen with cross links to standardized databases of chemical and protein entities. Manually curated repository of sequenced antibodies, developed by Geneva Antibody Facility at University of Geneva, in collaboration with CALIPHO and Swiss Prot groups at SIB Swiss Institute of Bioinformatics. Database provides list of sequenced antibodies with their known targets. Each antibody is assigned unique ID number that can be used in academic publications to increase reproducibility of experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data
DOI: 10.5281/zenodo.16894086
Resource Type: Dataset
Publisher: Zenodo
Publication Year: 2025
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Overview (Abstract):
The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.
Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology
Methods (Data Curation and Processing):
The ANDD was constructed through a rigorous multi-step process:
Data Specifications and Format:
The dataset is distributed in two parts:
ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):
Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.Ab/Nano_mutation).Technical Validation:
The quality of ANDD has been ensured through extensive validation:
Potential Uses:
ANDD is designed to accelerate research in computational biology and drug discovery, including:
Access and License:
The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.
Facebook
TwitterA database of information about each Cancer-Testis (CT) gene, its gene products and the immune response induced in cancer patients by these proteins. CT antigens are proteins normally expressed only in the human germ line but that are also present in a significant subset of malignant tumors. The practical importance of these proteins is that due to their restricted expression pattern they are frequently recognized by the immune system of cancer patients. Moreover, this antigenicity has raised the possibility of their being used as vaccines to actively stimulate immune responses in order to combat tumor growth. As a result worldwide research into many aspects of CT antigens is rapidly growing prompting the construction of this database as a resource for investigators involved in this area.
Facebook
TwitterBcipep is collection of the peptides having the role in humoral immunity. The peptides in the database have varying measure of immunogenicity. This database can assist in the development of methods for predicting B cell epitopes, designing synthetic vaccines, and in disease diagnosis. These peptides lead to the generation of antibodies which combine with antigens and are responsible for the host defense, and can be very useful for subunit vaccine designing. The database has 3031 peptide entries. For each peptide, the user can find a plethora of information, including entry number, peptide sequence, pathogen group, protein source, antigen structure, antibody, etc.
Facebook
TwitterThe ABCD (AntiBodies Chemically Defined) database is a manually curated depository of sequenced antibodies
Facebook
TwitterA database of carbohydrate antigens and matching antibodies. Epitopes and antibodies are listed within the database. Users may also search for epitopes and antibodies by keyword, epitope ID, tissue, receptor, enzyme, and other fields.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Epitopes are specific regions on an antigen’s surface that the immune system recognizes. Epitopes are usually protein regions on foreign immune-stimulating entities such as viruses and bacteria, and in some cases, endogenous proteins may act as antigens. Identifying epitopes is crucial for accelerating the development of vaccines and immunotherapies. However, mapping epitopes in pathogen proteomes is challenging using conventional methods. Screening artificial neoepitope libraries against antibodies can overcome this issue. Here, we applied conventional sequence analysis and methods inspired in natural language processing to reveal specific sequence patterns in the linear epitopes deposited in the Immune Epitope Database (www.iedb.org) that can serve as building blocks for the design of universal epitope libraries. Our results reveal that amino acid frequency in annotated linear epitopes differs from that in the human proteome. Aromatic residues are overrepresented, while the presence of cysteines is practically null in epitopes. Byte pair encoding tokenization shows high frequencies of tryptophan in tokens of 5, 6, and 7 amino acids, corroborating the findings of the conventional sequence analysis. These results can be applied to reduce the diversity of linear epitope libraries by orders of magnitude.
Facebook
TwitterDatabase of comparative gene mapping between species to assist the mapping of the genes related to phenotypic traits in livestock. The linkage maps, cytogenetic maps, polymerase chain reaction primers of pig, cattle, mouse and human, and their references have been included in the database, and the correspondence among species have been stipulated in the database. AGP is an animal genome database developed on a Unix workstation and maintained by a relational database management system. It is a joint project of National Institute of Agrobiological Sciences (NIAS) and Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries (STAFF-Institute), under cooperation with other related research institutes. AGP also contains the Pig Expression Data Explorer (PEDE), a database of porcine EST collections derived from full-length cDNA libraries and full-length sequences of the cDNA clones picked from the EST collection. The EST sequences have been clustered and assembled, and their similarity to sequences in RefSeq, and UniGene determined. The PEDE database system was constructed to store sequences and similarity data of swine full-length cDNA libraries and to make them available to users. It provides interfaces for keyword and ID searches of BLAST results and enables users to obtain sequence data and names of clones of interest. Putative SNPs in EST assemblies have been classified according to breed specificity and their effect on coding amino acids, and the assemblies are equipped with an SNP search interface. The database contains porcine nucleotide sequences and cDNA clones that are ready for analyses such as expression in mammalian cells, because of their high likelihood of containing full-length CDS. PEDE will be useful for researchers who want to explore genes that may be responsible for traits such as disease susceptibility. The database also offers information regarding major and minor porcine-specific antigens, which might be investigated in regard to the use of pigs as models in various medical research applications.
Facebook
TwitterFrom its origins in the 16thcentury, microscopy has allowed the cell, as the basic unit of eukaryotic life and disease, to be identified and analyzed. Today, quantitative cytometric technologies, either microscope based or flow cytometric, are the most powerful tools to analyze the proliferation, physiology and differentiation of cells generally, and are particularly useful in immunopathology. In combination with monoclonal antibodies (which recognize specific gene products) conjugated to sensitive fluorescent dyes, cell types can be identified according to the genes they express. They can also be isolated using either fluorescence-activated cell sorting (FACS) or magnetic cell sorting (MACS). In the past 20 years, immunofluorescence-based cytometry and cell sorting have become 'state of the art' technologies, mostly serving to identify subsets of lymphocytes and systemic changes in the immune system. Although it is certainly of value for diagnosis and analysis of immunopathology, cytometry did have one major limitation; except in a few experimental situations, it was not possible to focus analysis on those lymphocytes that specifically recognize the relevant antigens in a normal or pathological immune reaction. This drawback has recently been overcome both for B and T lymphocytes, using antigen to identify the cells. Today, a number of exciting new technologies make it possible to analyze and isolate specifically those lymphocytes that are directly involved in the immune reaction to given antigens. These advances will spur research in arthritis considerably.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionT-cell receptor (TCR) recognition of foreign peptides presented by the major histocompatibility complex (MHC) initiates the adaptive immune response against pathogens. While a large number of TCR sequences specific to different antigenic peptides are known to date, the structural data describing the conformation and contacting residues for TCR-peptide-MHC complexes is relatively limited. In the present study we aim to extend and analyze the set of available structures by performing highly accurate template-based modeling of these complexes using TCR sequences with known specificity. MethodsIdentification of CDR3 sequences and their further clustering, based on available spatial structures, V- and J-genes of corresponding T-cell receptors, and epitopes, was performed using the VDJdb database. Modeling of the selected CDR3 loops was conducted using a stepwise introduction of single amino acid substitutions to the template PDB structures, followed by optimization of the TCR-peptide-MHC contacting interface using the Rosetta package applications. Statistical analysis and recursive feature elimination procedures were carried out on computed energy values and properties of contacting amino acid residues between CDR3 loops and peptides, using R.ResultsUsing the set of 29 complex templates (including a template with SARS-CoV-2 antigen) and 732 specificity records, we built a database of 1585 model structures carrying substitutions in either TCRα or TCRβ chains with some models representing the result of different mutation pathways for the same final structure. This database allowed us to analyze features of amino acid contacts in TCR - peptide interfaces that govern antigen recognition preferences and interpret these interactions in terms of physicochemical properties of interacting residues.ConclusionOur results provide a methodology for creating high-quality TCR-peptide-MHC models for antigens of interest that can be utilized to predict TCR specificity.
Facebook
TwitterTo accelerate the process of tumor antigen discovery, we generated a publicly available Human Potential Tumor Associated Antigen database (HPtaa) with pTAAs identified by insilico computing. 3518 potential targets have been included in the database, which is freely available to academic users. It successfully screened out 41 of 82 known Cancer-Testis antigens, 6 of 18 differentiation antigen, 2 of 2 oncofetal antigen, and 7 of 12 FDA approved cancer markers that have Gene ID, therefore will provide a good platform for identification of cancer target genes. This database utilizes expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data, Unigene expression data. In addition, other relevant databases required for TAA discovery such as CGAP, CCDS, gene ontology database etc, were also incorporated. In order to integrate different expression platforms together, various strategies and algorithms have been developed. Known tumor antigens are gathered from literature and serve as training sets. A total tumor specificity penalty was computed from positive clue penalty for differential expression in human cancers, the corresponding differential ratio, and normal tissue restriction penalty for each gene. We hope this database will help with the process of cancer immunome identification, thus help with improving the diagnosis and treatment of human carcinomas.