Facebook
TwitterDatabase containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.
Facebook
TwitterTracks all antibody and nanobody related therapeutics recognized by World Health Organisation, and identifies any corresponding structures in Structural Antibody Database with near exact or exact variable domain sequence matches. Synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and availability of new sequence data published by WHO.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Денис Ириняков
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
In our training process, we utilized the SAbDab (Structural Antibody Database) to gather high-quality structural data for antibodies and antibody-antigen complexes. The SAbDab database provides a comprehensive and curated collection of antibody structures, which are crucial for developing and validating our computational models. The structural data and a summary of all available entries can be downloaded from the SAbDab website: The complete set of antibody structures is available at:… See the full description on the dataset page: https://huggingface.co/datasets/YueHuLab/AlphaPanda_training_dataset.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset combines monoclonal antibody (mAB) information from a variety of sources into a more concise and convenient format.
Here is a quick introduction to monoclonal antibodies in context with the COVID-19 pandemic.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data
DOI: 10.5281/zenodo.16894086
Resource Type: Dataset
Publisher: Zenodo
Publication Year: 2025
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Overview (Abstract):
The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.
Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology
Methods (Data Curation and Processing):
The ANDD was constructed through a rigorous multi-step process:
Data Specifications and Format:
The dataset is distributed in two parts:
ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):
Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.Ab/Nano_mutation).Technical Validation:
The quality of ANDD has been ensured through extensive validation:
Potential Uses:
ANDD is designed to accelerate research in computational biology and drug discovery, including:
Access and License:
The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.
Facebook
Twitterhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWUhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWU
Reproducibility data for the AntiBody Sequence Database (ABSD) article. This dataset contains the raw data (antibody sequences) extracted on June 20, 2024, from various databases, as well as the several scripts, to ensure the reproducibility of our results. External databases used: ABDB, AbPDB, CoV-AbDab, Genbank, IMGT, PDB, SACS, SAbDab, TheraSAbDab, UniProt, KABAT Scripts usage: each external database has a corresponding script to format all antibody sequences extracted from it. A last script enable merging all extracted antibody sequences while removing redundancy, standardizing and cleaning data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Filtering criteria applied to the SAbDab dataset for sequence design evaluation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🦋 peleke-1 Training Data | SAbDab Antibody-Antigen Complex Sequences
A curated subset of SAbDab from June 2025 that includes antibody-antigen pairs of sequences.
Columns:
Column Name Description Example
pdb_id The PDB ID on Protein Data Bank 8xa4
h_chain_id The chain ID of the antibody's heavy chain C
l_chain_id The chain ID of the antibody's light chain D
antigen_ids A |-delimited list of chain IDs of the antigen chain(s) A|B
h_chain_seq The heavy… See the full description on the dataset page: https://huggingface.co/datasets/silicobio/peleke_antibody-antigen_sabdab.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model weights of the AbMPNN model (arXiv:2310.19513) presented at the 2023 ICML Workshop on Computational Biology, and csv files with the split between train, test and validation across the SAbDab and ImmuneBuilder datasets.
This model is based on ProteinMPNN and can be run using the corresponding code: https://github.com/dauparas/ProteinMPNN.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterDatabase containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.