100+ datasets found
  1. n

    Human Potential Tumor Associated Antigen database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Human Potential Tumor Associated Antigen database [Dataset]. http://identifiers.org/RRID:SCR_002938
    Explore at:
    Dataset updated
    Aug 22, 2024
    Description

    To accelerate the process of tumor antigen discovery, we generated a publicly available Human Potential Tumor Associated Antigen database (HPtaa) with pTAAs identified by insilico computing. 3518 potential targets have been included in the database, which is freely available to academic users. It successfully screened out 41 of 82 known Cancer-Testis antigens, 6 of 18 differentiation antigen, 2 of 2 oncofetal antigen, and 7 of 12 FDA approved cancer markers that have Gene ID, therefore will provide a good platform for identification of cancer target genes. This database utilizes expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data, Unigene expression data. In addition, other relevant databases required for TAA discovery such as CGAP, CCDS, gene ontology database etc, were also incorporated. In order to integrate different expression platforms together, various strategies and algorithms have been developed. Known tumor antigens are gathered from literature and serve as training sets. A total tumor specificity penalty was computed from positive clue penalty for differential expression in human cancers, the corresponding differential ratio, and normal tissue restriction penalty for each gene. We hope this database will help with the process of cancer immunome identification, thus help with improving the diagnosis and treatment of human carcinomas.

  2. Serum Antibody Repertoire Profiling Using In Silico Antigen Screen

    • plos.figshare.com
    doc
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinyue Liu; Qiang Hu; Song Liu; Luke J. Tallo; Lisa Sadzewicz; Cassandra A. Schettine; Mikhail Nikiforov; Elena N. Klyushnenkova; Yurij Ionov (2023). Serum Antibody Repertoire Profiling Using In Silico Antigen Screen [Dataset]. http://doi.org/10.1371/journal.pone.0067181
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xinyue Liu; Qiang Hu; Song Liu; Luke J. Tallo; Lisa Sadzewicz; Cassandra A. Schettine; Mikhail Nikiforov; Elena N. Klyushnenkova; Yurij Ionov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Serum antibodies are valuable source of information on the health state of an organism. The profiles of serum antibody reactivity can be generated by using a high throughput sequencing of peptide-coding DNA from combinatorial random peptide phage display libraries selected for binding to serum antibodies. Here we demonstrate that the targets of immune response, which are recognized by serum antibodies directed against sequential epitopes, can be identified using the serum antibody repertoire profiles generated by high throughput sequencing. We developed an algorithm to filter the results of the protein database BLAST search for selected peptides to distinguish real antigens recognized by serum antibodies from irrelevant proteins retrieved randomly. When we used this algorithm to analyze serum antibodies from mice immunized with human protein, we were able to identify the protein used for immunizations among the top candidate antigens. When we analyzed human serum sample from the metastatic melanoma patient, the recombinant protein, corresponding to the top candidate from the list generated using the algorithm, was recognized by antibodies from metastatic melanoma serum on the western blot, thus confirming that the method can identify autoantigens recognized by serum antibodies. We demonstrated also that our unbiased method of looking at the repertoire of serum antibodies reveals quantitative information on the epitope composition of the targets of immune response. A method for deciphering information contained in the serum antibody repertoire profiles may help to identify autoantibodies that can be used for diagnosing and monitoring autoimmune diseases or malignancies.

  3. n

    SV40 Large T-Antigen Mutant Database

    • neuinfo.org
    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). SV40 Large T-Antigen Mutant Database [Dataset]. http://identifiers.org/RRID:SCR_005313
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 15, 2013. The SV40 T antigen database lists viruses and plasmids expressing mutant forms of large T antigen. Each entry contains information regarding the mutant designation, mutant type, virus strain, nucleotide change, amino acid change and pertinent references. Category: Human Genes and Diseases Subcategory: Cancer gene databases

  4. b

    Antibody-Antigen Complex Database

    • bioregistry.io
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Antibody-Antigen Complex Database [Dataset]. https://bioregistry.io/registry/aacdb
    Explore at:
    Dataset updated
    Oct 20, 2025
    Description

    Identifiers represent antibody-antigen complexes in the Antigen-Antibody Complex Database (AACDB), which provides comprehensive structural and functional annotations including paratope and epitope information, antibody developability data, and antigen-drug target relationships to support immunoinformatics research and therapeutic antibody development.

  5. d

    Immune Epitope Database and Analysis Resource (IEDB)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Immune Epitope Database and Analysis Resource (IEDB) [Dataset]. https://catalog.data.gov/dataset/immune-epitope-database-and-analysis-resource-iedb
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    This repository contains antibody/B cell and T cell epitope information and epitope prediction and analysis tools for use by the research community worldwide. Immune epitopes are defined as molecular structures recognized by specific antigen receptors of the immune system, namely antibodies, B cell receptors, and T cell receptors. Immune epitopes from infectious diseases, excluding HIV, and immune-mediated diseases and the accompanying biological information are included.

  6. d

    Blood Group Antigen Gene Mutation Database

    • dknet.org
    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Blood Group Antigen Gene Mutation Database [Dataset]. http://identifiers.org/RRID:SCR_002297
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on August 23, 2019.BGMUT was database that provided publicly accessible platform for DNA sequences and curated set of blood mutation information. Data Archive are available at ftp://ftp.ncbi.nlm.nih.gov/pub/mhc/rbc/Final Archive.

  7. Table3_CAD v1.0: Cancer Antigens Database Platform for Cancer Antigen...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jijun Yu; Luoxuan Wang; Xiangya Kong; Yang Cao; Mengmeng Zhang; Zhaolin Sun; Yang Liu; Jing Wang; Beifen Shen; Xiaochen Bo; Jiannan Feng (2023). Table3_CAD v1.0: Cancer Antigens Database Platform for Cancer Antigen Algorithm Development and Information Exploration.docx [Dataset]. http://doi.org/10.3389/fbioe.2022.819583.s007
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Jijun Yu; Luoxuan Wang; Xiangya Kong; Yang Cao; Mengmeng Zhang; Zhaolin Sun; Yang Liu; Jing Wang; Beifen Shen; Xiaochen Bo; Jiannan Feng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer vaccines have gradually attracted attention for their tremendous preclinical and clinical performance. With the development of next-generation sequencing technologies and related algorithms, pipelines based on sequencing and machine learning methods have become mainstream in cancer antigen prediction; of particular focus are neoantigens, mutation peptides that only exist in tumor cells that lack central tolerance and have fewer side effects. The rapid prediction and filtering of neoantigen peptides are crucial to the development of neoantigen-based cancer vaccines. However, due to the lack of verified neoantigen datasets and insufficient research on the properties of neoantigens, neoantigen prediction algorithms still need to be improved. Here, we recruited verified cancer antigen peptides and collected as much relevant peptide information as possible. Then, we discussed the role of each dataset for algorithm improvement in cancer antigen research, especially neoantigen prediction. A platform, Cancer Antigens Database (CAD, http://cad.bio-it.cn/), was designed to facilitate users to perform a complete exploration of cancer antigens online.

  8. Example proteins and validated epitopes present in the IEDB 3.0 database.

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller (2023). Example proteins and validated epitopes present in the IEDB 3.0 database. [Dataset]. http://doi.org/10.1371/journal.pone.0273494.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example proteins and validated epitopes present in the IEDB 3.0 database.

  9. s

    Epitome

    • scicrunch.org
    • neuinfo.org
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Epitome [Dataset]. http://identifiers.org/RRID:SCR_007641
    Explore at:
    Description

    Epitome is a database of structurally inferred antigenic epitopes in proteins. It includes all known antigenic residues and the antibodies that interact with them, including a detailed description of residues involved in the interaction and their sequence/structure environments. Additionally, Interactions can be visualized using an interface into Jmol. The website also contains specialized software, NLProt, to enable users to extract protein names and sequences from natural language text, and links to several other databases involved in antibody/antigen interactions. antibody/antigen interactions, antigen epitope

  10. n

    Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
    Explore at:
    Dataset updated
    Jun 27, 2024
    Description

    The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

  11. n

    ExPASy ABCD database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ExPASy ABCD database [Dataset]. http://identifiers.org/RRID:SCR_017401
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Repository of sequenced antibodies, integrating curated information about antibody and its antigen with cross links to standardized databases of chemical and protein entities. Manually curated repository of sequenced antibodies, developed by Geneva Antibody Facility at University of Geneva, in collaboration with CALIPHO and Swiss Prot groups at SIB Swiss Institute of Bioinformatics. Database provides list of sequenced antibodies with their known targets. Each antibody is assigned unique ID number that can be used in academic publications to increase reproducibility of experiments.

  12. Antibody and Nanobody Design Dataset (ANDD)

    • zenodo.org
    zip
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yikai Wu; Yikai Wu (2025). Antibody and Nanobody Design Dataset (ANDD) [Dataset]. http://doi.org/10.5281/zenodo.16894086
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yikai Wu; Yikai Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data

    DOI: 10.5281/zenodo.16894086

    Resource Type: Dataset

    Publisher: Zenodo

    Publication Year: 2025

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

    Overview (Abstract):

    The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.

    Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology

    Methods (Data Curation and Processing):

    The ANDD was constructed through a rigorous multi-step process:

    1. Data Collection: Data was aggregated from 15 primary sources, including both antibody/nanobody-specific databases (e.g., OAS, SAbDab, INDI, sdAb-DB) and general protein databases (e.g., PDB, UNIPROT, PDBbind).
    2. Integration and Standardization: Data from disparate sources was consolidated into a consistent format, addressing challenges of format inconsistency. Entries were manually validated to exclude non-relevant data (e.g., T-cell receptors).
    3. Affinity Data Augmentation: The ANTIPASTI deep learning model was used to predict and add binding affinity values for entries that had structural data but lacked experimental affinity measurements.
    4. Manual Curation: Web-based data and information from publicly available patents targeting key antigens (HER2, IL-6, CD45, SARS-CoV-2 RBD) were manually extracted to enhance completeness.
    5. Hierarchical Organization: Data is organized in a hierarchical structure, offering four progressively detailed levels: Sequence-only, Sequence+Structure, Sequence+Structure+Antigen, and Sequence+Structure+Antigen+Affinity.

    Data Specifications and Format:

    The dataset is distributed in two parts:

    1. ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.
    2. All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.

    The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):

    • General Info: Source, Update_Date, PDB_ID, Experimental_Method, Ab_or_Nano, Source_Organism.
    • Chain Details: Entity IDs, Asym IDs, Database Accession Codes, and Macromolecule Names for Heavy (H) and Light (L) chains.
    • Antigen Details: Ag_Name, Ag_Seq, Ag_Source Organism, and relevant database identifiers.
    • Sequence Data: Full amino acid sequences for H/L chains and individual CDR regions (H1-H3, L1-L3).
    • Affinity Data: Experimentally measured or predicted Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.
    • Mutation Data: Annotation of any amino acid mutations (Ab/Nano_mutation).

    Technical Validation:

    The quality of ANDD has been ensured through extensive validation:

    1. Manual Curation: A rigorous manual review process was conducted to check for accuracy and consistency between sequence, structure, and affinity data across randomly selected entries.
    2. Affinity Validation with AlphaBind: The experimental Kd values were validated by comparing them against enrichment ratios predicted by the AlphaBind model, showing a significant correlation (Pearson’s r = 0.750).
    3. Cross-Mapping Validation: The internal consistency between Kd and ∆Gbinding values within the dataset was confirmed, showing a perfect correlation (Pearson’s r = 1.000) as per thermodynamic principles.
    4. Proof-of-Concept Application: The dataset's utility was demonstrated by fine-tuning the Diffab generative model on a subset of ANDD. The fine-tuned model showed significant improvements in generating nanobodies with better predicted binding affinity, structural diversity, and developability metrics.

    Potential Uses:

    ANDD is designed to accelerate research in computational biology and drug discovery, including:

    • Training and benchmarking deep learning models for de novoantibody/nanobody sequence and structure generation.
    • Developing and validating predictive models for antibody-antigen binding affinity.
    • Studying structure-function relationships in antibody-antigen interactions.
    • Facilitating the design of optimized therapeutic antibodies and nanobodies with improved specificity and efficacy.

    Access and License:

    The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.

  13. d

    CTDatabase

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). CTDatabase [Dataset]. http://identifiers.org/RRID:SCR_007614
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of information about each Cancer-Testis (CT) gene, its gene products and the immune response induced in cancer patients by these proteins. CT antigens are proteins normally expressed only in the human germ line but that are also present in a significant subset of malignant tumors. The practical importance of these proteins is that due to their restricted expression pattern they are frequently recognized by the immune system of cancer patients. Moreover, this antigenicity has raised the possibility of their being used as vaccines to actively stimulate immune responses in order to combat tumor growth. As a result worldwide research into many aspects of CT antigens is rapidly growing prompting the construction of this database as a resource for investigators involved in this area.

  14. n

    BciPep

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BciPep [Dataset]. http://identifiers.org/RRID:SCR_007559
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Bcipep is collection of the peptides having the role in humoral immunity. The peptides in the database have varying measure of immunogenicity. This database can assist in the development of methods for predicting B cell epitopes, designing synthetic vaccines, and in disease diagnosis. These peptides lead to the generation of antibodies which combine with antigens and are responsible for the host defense, and can be very useful for subunit vaccine designing. The database has 3031 peptide entries. For each peptide, the user can find a plethora of information, including entry number, peptide sequence, pathogen group, protein source, antigen structure, antibody, etc.

  15. b

    AntiBodies Chemically Defined database

    • bioregistry.io
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). AntiBodies Chemically Defined database [Dataset]. https://bioregistry.io/registry/abcd
    Explore at:
    Dataset updated
    Aug 12, 2021
    Description

    The ABCD (AntiBodies Chemically Defined) database is a manually curated depository of sequenced antibodies

  16. d

    GlycoEpitope

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). GlycoEpitope [Dataset]. http://identifiers.org/RRID:SCR_014404
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of carbohydrate antigens and matching antibodies. Epitopes and antibodies are listed within the database. Users may also search for epitopes and antibodies by keyword, epitope ID, tissue, receptor, enzyme, and other fields.

  17. f

    Data from: Statistical Analysis and Tokenization of Epitopes to Construct...

    • acs.figshare.com
    bin
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Lopez-Martinez; Aitor Manteca; Noelia Ferruz; Aitziber L. Cortajarena (2023). Statistical Analysis and Tokenization of Epitopes to Construct Artificial Neoepitope Libraries [Dataset]. http://doi.org/10.1021/acssynbio.3c00201.s004
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    ACS Publications
    Authors
    Elena Lopez-Martinez; Aitor Manteca; Noelia Ferruz; Aitziber L. Cortajarena
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Epitopes are specific regions on an antigen’s surface that the immune system recognizes. Epitopes are usually protein regions on foreign immune-stimulating entities such as viruses and bacteria, and in some cases, endogenous proteins may act as antigens. Identifying epitopes is crucial for accelerating the development of vaccines and immunotherapies. However, mapping epitopes in pathogen proteomes is challenging using conventional methods. Screening artificial neoepitope libraries against antibodies can overcome this issue. Here, we applied conventional sequence analysis and methods inspired in natural language processing to reveal specific sequence patterns in the linear epitopes deposited in the Immune Epitope Database (www.iedb.org) that can serve as building blocks for the design of universal epitope libraries. Our results reveal that amino acid frequency in annotated linear epitopes differs from that in the human proteome. Aromatic residues are overrepresented, while the presence of cysteines is practically null in epitopes. Byte pair encoding tokenization shows high frequencies of tryptophan in tokens of 5, 6, and 7 amino acids, corroborating the findings of the conventional sequence analysis. These results can be applied to reduce the diversity of linear epitope libraries by orders of magnitude.

  18. n

    Animal Genome Database

    • neuinfo.org
    • rrid.site
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Animal Genome Database [Dataset]. http://identifiers.org/RRID:SCR_008165
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of comparative gene mapping between species to assist the mapping of the genes related to phenotypic traits in livestock. The linkage maps, cytogenetic maps, polymerase chain reaction primers of pig, cattle, mouse and human, and their references have been included in the database, and the correspondence among species have been stipulated in the database. AGP is an animal genome database developed on a Unix workstation and maintained by a relational database management system. It is a joint project of National Institute of Agrobiological Sciences (NIAS) and Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries (STAFF-Institute), under cooperation with other related research institutes. AGP also contains the Pig Expression Data Explorer (PEDE), a database of porcine EST collections derived from full-length cDNA libraries and full-length sequences of the cDNA clones picked from the EST collection. The EST sequences have been clustered and assembled, and their similarity to sequences in RefSeq, and UniGene determined. The PEDE database system was constructed to store sequences and similarity data of swine full-length cDNA libraries and to make them available to users. It provides interfaces for keyword and ID searches of BLAST results and enables users to obtain sequence data and names of clones of interest. Putative SNPs in EST assemblies have been classified according to breed specificity and their effect on coding amino acids, and the assemblies are equipped with an SNP search interface. The database contains porcine nucleotide sequences and cDNA clones that are ready for analyses such as expression in mammalian cells, because of their high likelihood of containing full-length CDS. PEDE will be useful for researchers who want to explore genes that may be responsible for traits such as disease susceptibility. The database also offers information regarding major and minor porcine-specific antigens, which might be investigated in regard to the use of pigs as models in various medical research applications.

  19. d

    Data from: Antigen-specific cytometry

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Antigen-specific cytometry [Dataset]. https://catalog.data.gov/dataset/antigen-specific-cytometry
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    National Institutes of Health
    Description

    From its origins in the 16thcentury, microscopy has allowed the cell, as the basic unit of eukaryotic life and disease, to be identified and analyzed. Today, quantitative cytometric technologies, either microscope based or flow cytometric, are the most powerful tools to analyze the proliferation, physiology and differentiation of cells generally, and are particularly useful in immunopathology. In combination with monoclonal antibodies (which recognize specific gene products) conjugated to sensitive fluorescent dyes, cell types can be identified according to the genes they express. They can also be isolated using either fluorescence-activated cell sorting (FACS) or magnetic cell sorting (MACS). In the past 20 years, immunofluorescence-based cytometry and cell sorting have become 'state of the art' technologies, mostly serving to identify subsets of lymphocytes and systemic changes in the immune system. Although it is certainly of value for diagnosis and analysis of immunopathology, cytometry did have one major limitation; except in a few experimental situations, it was not possible to focus analysis on those lymphocytes that specifically recognize the relevant antigens in a normal or pathological immune reaction. This drawback has recently been overcome both for B and T lymphocytes, using antigen to identify the cells. Today, a number of exciting new technologies make it possible to analyze and isolate specifically those lymphocytes that are directly involved in the immune reaction to given antigens. These advances will spur research in arthritis considerably.

  20. f

    DataSheet_2_Large-scale template-based structural modeling of T-cell...

    • frontiersin.figshare.com
    pdf
    Updated Aug 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitrii S. Shcherbinin; Vadim K. Karnaukhov; Ivan V. Zvyagin; Dmitriy M. Chudakov; Mikhail Shugay (2023). DataSheet_2_Large-scale template-based structural modeling of T-cell receptors with known antigen specificity reveals complementarity features.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1224969.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Dmitrii S. Shcherbinin; Vadim K. Karnaukhov; Ivan V. Zvyagin; Dmitriy M. Chudakov; Mikhail Shugay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionT-cell receptor (TCR) recognition of foreign peptides presented by the major histocompatibility complex (MHC) initiates the adaptive immune response against pathogens. While a large number of TCR sequences specific to different antigenic peptides are known to date, the structural data describing the conformation and contacting residues for TCR-peptide-MHC complexes is relatively limited. In the present study we aim to extend and analyze the set of available structures by performing highly accurate template-based modeling of these complexes using TCR sequences with known specificity. MethodsIdentification of CDR3 sequences and their further clustering, based on available spatial structures, V- and J-genes of corresponding T-cell receptors, and epitopes, was performed using the VDJdb database. Modeling of the selected CDR3 loops was conducted using a stepwise introduction of single amino acid substitutions to the template PDB structures, followed by optimization of the TCR-peptide-MHC contacting interface using the Rosetta package applications. Statistical analysis and recursive feature elimination procedures were carried out on computed energy values and properties of contacting amino acid residues between CDR3 loops and peptides, using R.ResultsUsing the set of 29 complex templates (including a template with SARS-CoV-2 antigen) and 732 specificity records, we built a database of 1585 model structures carrying substitutions in either TCRα or TCRβ chains with some models representing the result of different mutation pathways for the same final structure. This database allowed us to analyze features of amino acid contacts in TCR - peptide interfaces that govern antigen recognition preferences and interpret these interactions in terms of physicochemical properties of interacting residues.ConclusionOur results provide a methodology for creating high-quality TCR-peptide-MHC models for antigens of interest that can be utilized to predict TCR specificity.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Human Potential Tumor Associated Antigen database [Dataset]. http://identifiers.org/RRID:SCR_002938

Human Potential Tumor Associated Antigen database

RRID:SCR_002938, nif-0000-02987, Human Potential Tumor Associated Antigen database (RRID:SCR_002938), HPtaa Database

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 22, 2024
Description

To accelerate the process of tumor antigen discovery, we generated a publicly available Human Potential Tumor Associated Antigen database (HPtaa) with pTAAs identified by insilico computing. 3518 potential targets have been included in the database, which is freely available to academic users. It successfully screened out 41 of 82 known Cancer-Testis antigens, 6 of 18 differentiation antigen, 2 of 2 oncofetal antigen, and 7 of 12 FDA approved cancer markers that have Gene ID, therefore will provide a good platform for identification of cancer target genes. This database utilizes expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data, Unigene expression data. In addition, other relevant databases required for TAA discovery such as CGAP, CCDS, gene ontology database etc, were also incorporated. In order to integrate different expression platforms together, various strategies and algorithms have been developed. Known tumor antigens are gathered from literature and serve as training sets. A total tumor specificity penalty was computed from positive clue penalty for differential expression in human cancers, the corresponding differential ratio, and normal tissue restriction penalty for each gene. We hope this database will help with the process of cancer immunome identification, thus help with improving the diagnosis and treatment of human carcinomas.

Search
Clear search
Close search
Google apps
Main menu