10 datasets found
  1. n

    Structural Antibody Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022096
    Explore at:
    Dataset updated
    Feb 1, 2001
    Description

    Database containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.

  2. n

    Therapeutic Structural Antibody Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Therapeutic Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022093/resolver/mentions
    Explore at:
    Dataset updated
    Oct 23, 2024
    Description

    Tracks all antibody and nanobody related therapeutics recognized by World Health Organisation, and identifies any corresponding structures in Structural Antibody Database with near exact or exact variable domain sequence matches. Synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and availability of new sequence data published by WHO.

  3. sabdab-proteins

    • kaggle.com
    zip
    Updated May 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Денис Ириняков (2023). sabdab-proteins [Dataset]. https://www.kaggle.com/datasets/akscent/sabdab-proteins
    Explore at:
    zip(946959339 bytes)Available download formats
    Dataset updated
    May 7, 2023
    Authors
    Денис Ириняков
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Денис Ириняков

    Released under Database: Open Database, Contents: Database Contents

    Contents

  4. h

    AlphaPanda_training_dataset

    • huggingface.co
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Hu (2024). AlphaPanda_training_dataset [Dataset]. https://huggingface.co/datasets/YueHuLab/AlphaPanda_training_dataset
    Explore at:
    Dataset updated
    Oct 19, 2024
    Authors
    Yue Hu
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    In our training process, we utilized the SAbDab (Structural Antibody Database) to gather high-quality structural data for antibodies and antibody-antigen complexes. The SAbDab database provides a comprehensive and curated collection of antibody structures, which are crucial for developing and validating our computational models. The structural data and a summary of all available entries can be downloaded from the SAbDab website: The complete set of antibody structures is available at:… See the full description on the dataset page: https://huggingface.co/datasets/YueHuLab/AlphaPanda_training_dataset.

  5. Monoclonal Antibodies

    • kaggle.com
    zip
    Updated Aug 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Vandersmith (2020). Monoclonal Antibodies [Dataset]. https://www.kaggle.com/rvanasa/monoclonal-antibodies
    Explore at:
    zip(51198335 bytes)Available download formats
    Dataset updated
    Aug 22, 2020
    Authors
    Ryan Vandersmith
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    This dataset combines monoclonal antibody (mAB) information from a variety of sources into a more concise and convenient format.

    Here is a quick introduction to monoclonal antibodies in context with the COVID-19 pandemic.

    Sources

    • RCSB PDB - 3D protein models
    • SAbDab - pairs of antigens and antibodies from RCSB
    • Thera-SAbDab - therapeutic monoclonal antibodies
    • CoV-AbDab - COVID-19 related antibodies
    • ANARCI - CDR predictions
    • DSSP - secondary structure and solubility predictions
  6. Antibody and Nanobody Design Dataset (ANDD)

    • zenodo.org
    zip
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yikai Wu; Yikai Wu (2025). Antibody and Nanobody Design Dataset (ANDD) [Dataset]. http://doi.org/10.5281/zenodo.16894086
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yikai Wu; Yikai Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data

    DOI: 10.5281/zenodo.16894086

    Resource Type: Dataset

    Publisher: Zenodo

    Publication Year: 2025

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

    Overview (Abstract):

    The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.

    Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology

    Methods (Data Curation and Processing):

    The ANDD was constructed through a rigorous multi-step process:

    1. Data Collection: Data was aggregated from 15 primary sources, including both antibody/nanobody-specific databases (e.g., OAS, SAbDab, INDI, sdAb-DB) and general protein databases (e.g., PDB, UNIPROT, PDBbind).
    2. Integration and Standardization: Data from disparate sources was consolidated into a consistent format, addressing challenges of format inconsistency. Entries were manually validated to exclude non-relevant data (e.g., T-cell receptors).
    3. Affinity Data Augmentation: The ANTIPASTI deep learning model was used to predict and add binding affinity values for entries that had structural data but lacked experimental affinity measurements.
    4. Manual Curation: Web-based data and information from publicly available patents targeting key antigens (HER2, IL-6, CD45, SARS-CoV-2 RBD) were manually extracted to enhance completeness.
    5. Hierarchical Organization: Data is organized in a hierarchical structure, offering four progressively detailed levels: Sequence-only, Sequence+Structure, Sequence+Structure+Antigen, and Sequence+Structure+Antigen+Affinity.

    Data Specifications and Format:

    The dataset is distributed in two parts:

    1. ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.
    2. All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.

    The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):

    • General Info: Source, Update_Date, PDB_ID, Experimental_Method, Ab_or_Nano, Source_Organism.
    • Chain Details: Entity IDs, Asym IDs, Database Accession Codes, and Macromolecule Names for Heavy (H) and Light (L) chains.
    • Antigen Details: Ag_Name, Ag_Seq, Ag_Source Organism, and relevant database identifiers.
    • Sequence Data: Full amino acid sequences for H/L chains and individual CDR regions (H1-H3, L1-L3).
    • Affinity Data: Experimentally measured or predicted Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.
    • Mutation Data: Annotation of any amino acid mutations (Ab/Nano_mutation).

    Technical Validation:

    The quality of ANDD has been ensured through extensive validation:

    1. Manual Curation: A rigorous manual review process was conducted to check for accuracy and consistency between sequence, structure, and affinity data across randomly selected entries.
    2. Affinity Validation with AlphaBind: The experimental Kd values were validated by comparing them against enrichment ratios predicted by the AlphaBind model, showing a significant correlation (Pearson’s r = 0.750).
    3. Cross-Mapping Validation: The internal consistency between Kd and ∆Gbinding values within the dataset was confirmed, showing a perfect correlation (Pearson’s r = 1.000) as per thermodynamic principles.
    4. Proof-of-Concept Application: The dataset's utility was demonstrated by fine-tuning the Diffab generative model on a subset of ANDD. The fine-tuned model showed significant improvements in generating nanobodies with better predicted binding affinity, structural diversity, and developability metrics.

    Potential Uses:

    ANDD is designed to accelerate research in computational biology and drug discovery, including:

    • Training and benchmarking deep learning models for de novoantibody/nanobody sequence and structure generation.
    • Developing and validating predictive models for antibody-antigen binding affinity.
    • Studying structure-function relationships in antibody-antigen interactions.
    • Facilitating the design of optimized therapeutic antibodies and nanobodies with improved specificity and efficacy.

    Access and License:

    The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.

  7. R

    Raw data from external antibody databases and scripts to homogenize and...

    • entrepot.recherche.data.gouv.fr
    application/x-gzip +1
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas MAILLET; Nicolas MAILLET; Simon MALESYS; Simon MALESYS (2025). Raw data from external antibody databases and scripts to homogenize and standardize them used to build AntiBody Sequence Database (for reproducibility) [Dataset]. http://doi.org/10.57745/DDLHWU
    Explore at:
    application/x-gzip(620431), application/x-gzip(163643), application/x-gzip(6833391387), text/markdown(12475), application/x-gzip(80726198), application/x-gzip(65497009)Available download formats
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Nicolas MAILLET; Nicolas MAILLET; Simon MALESYS; Simon MALESYS
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWUhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWU

    Description

    Reproducibility data for the AntiBody Sequence Database (ABSD) article. This dataset contains the raw data (antibody sequences) extracted on June 20, 2024, from various databases, as well as the several scripts, to ensure the reproducibility of our results. External databases used: ABDB, AbPDB, CoV-AbDab, Genbank, IMGT, PDB, SACS, SAbDab, TheraSAbDab, UniProt, KABAT Scripts usage: each external database has a corresponding script to format all antibody sequences extracted from it. A last script enable merging all extracted antibody sequences while removing redundancy, standardizing and cleaning data.

  8. Filtering criteria applied to the SAbDab dataset for sequence design...

    • plos.figshare.com
    xls
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifan Li; Yuxiang Lang; Chenrui Xu; Yi Zhou; Ziwei Pang; Per Jr. Greisen (2025). Filtering criteria applied to the SAbDab dataset for sequence design evaluation. [Dataset]. http://doi.org/10.1371/journal.pone.0324566.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yifan Li; Yuxiang Lang; Chenrui Xu; Yi Zhou; Ziwei Pang; Per Jr. Greisen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Filtering criteria applied to the SAbDab dataset for sequence design evaluation.

  9. h

    peleke_antibody-antigen_sabdab

    • huggingface.co
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silico Biosciences (2025). peleke_antibody-antigen_sabdab [Dataset]. https://huggingface.co/datasets/silicobio/peleke_antibody-antigen_sabdab
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset authored and provided by
    Silico Biosciences
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🦋 peleke-1 Training Data | SAbDab Antibody-Antigen Complex Sequences

    A curated subset of SAbDab from June 2025 that includes antibody-antigen pairs of sequences.

      Columns:
    

    Column Name Description Example

    pdb_id The PDB ID on Protein Data Bank 8xa4

    h_chain_id The chain ID of the antibody's heavy chain C

    l_chain_id The chain ID of the antibody's light chain D

    antigen_ids A |-delimited list of chain IDs of the antigen chain(s) A|B

    h_chain_seq The heavy… See the full description on the dataset page: https://huggingface.co/datasets/silicobio/peleke_antibody-antigen_sabdab.

  10. Data from: Inverse folding for antibody sequence design using deep learning

    • zenodo.org
    bin, csv
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frédéric A. Dreyer; Daniel Cutting; Constantin Schneider; Henry Kenlay; Charlotte M. Deane; Frédéric A. Dreyer; Daniel Cutting; Constantin Schneider; Henry Kenlay; Charlotte M. Deane (2023). Inverse folding for antibody sequence design using deep learning [Dataset]. http://doi.org/10.5281/zenodo.8164693
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Frédéric A. Dreyer; Daniel Cutting; Constantin Schneider; Henry Kenlay; Charlotte M. Deane; Frédéric A. Dreyer; Daniel Cutting; Constantin Schneider; Henry Kenlay; Charlotte M. Deane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model weights of the AbMPNN model (arXiv:2310.19513) presented at the 2023 ICML Workshop on Computational Biology, and csv files with the split between train, test and validation across the SAbDab and ImmuneBuilder datasets.

    This model is based on ProteinMPNN and can be run using the corresponding code: https://github.com/dauparas/ProteinMPNN.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2001). Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022096

Structural Antibody Database

RRID:SCR_022096, Structural Antibody Database (RRID:SCR_022096), SAbDab

Explore at:
Dataset updated
Feb 1, 2001
Description

Database containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.

Search
Clear search
Close search
Google apps
Main menu