Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the change of PPB binding affinity upon mutation, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset concerning PPB affinity. Therefore, the current study aimed to introduce and disclose a PPB affinity dataset (PPB-Affinity), which will definitely benefit the development of applicable DL to predict the PPB affinity. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest and publicly available PPB-Affinity dataset, which may finally help the industry in improving the screening efficiency of discovering new large-molecular drugs.
Codes for PPB-Affinity database preparation is disclosed at https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow" href="https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow">https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow.
Codes for the benchmark algorithm is disclosed at https://github.com/ChenPy00/PPB-Affinity.
Files are orginized as follows:
- PDB/
- Affinity Benchmark v5.5/
- file1.pdb
- file2.pdb
- ...
- filek.pdb
- ATLAS/
- PDBbind v2020/
- SAbDab/
- SKEMPIv2.0/
Facebook
TwitterVirtual screening of protein–protein and protein–peptide interactions is a challenging task that directly impacts the processes of hit identification and hit-to-lead optimization in drug design projects involving peptide-based pharmaceuticals. Although several screening tools designed to predict the binding affinity of protein–protein complexes have been proposed, methods specifically developed to predict protein–peptide binding affinity are comparatively scarce. Frequently, predictors trained to score the affinity of small molecules are used for peptides indistinctively, despite the larger complexity and heterogeneity of interactions rendered by peptide binders. To address this issue, we introduce PPI-Affinity, a tool that leverages support vector machine (SVM) predictors of binding affinity to screen datasets of protein–protein and protein–peptide complexes, as well as to generate and rank mutants of a given structure. The performance of the SVM models was assessed on four benchmark datasets, which include protein–protein and protein–peptide binding affinity data. In addition, we evaluated our model on a set of mutants of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.
Facebook
Twitterhttps://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdfhttps://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdf
This Zenodo repository provides comprehensive resources for the paper titled "Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction" published on Bioinformatics. We created a dataset of 63,000 molecular dynamics simulations by performing 10 simulations of 10 ns on 6,300 complexes. Neural networks were developed to learn from this data in order to predict the binding affinities of protein-ligand complexes. The implementation of these neural networks are available on github. Our collection includes training/benchmark datasets, trained statistical models, and results on test sets (CSV & PDF files).
Training/benchmark datasets:
Training, validation and test sets are provided to train and evaluate the following neural networks:
For each training methodology (MD data augmentation and spatiotemporal learning), we provide the data for the whole complex, only the ligand or only the protein. Additionally for spatiotemporal learning, we provide the data with only the ligand using the tracking mode.
Statistical models:
We provide the models trained with Pafnucy, Proli, Densenucy, Timenucy and Videonucy. Each models were trained in 10 replicates.
For Pafnucy, Proli, Densenucy, we provide the models trained with random and systematic rotations, as well as with or without MD data augmentation.
For Proli, Densenucy, Timenucy and Videonucy, we provide the models trained on the whole complex, only the ligand or only the protein.
For Pafnucy we also provide the models trained on the reduced set (5932 complexes).
Results on test sets (CSV & PDF files):
We provide the predictions on the PDBbind v.2016 core set.
Results on the FEP dataset are also provided for Pafnucy, Proli and Densenucy.
The Raw MD data (~4.5 To) are stored, and can be visualized/downloaded, on the MDDB.
This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-A0100712496 & 2022-AD011013521) and CRIANN (Grant 2021002).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structures of the PDBBind dataset (general protein-ligand) prepared with CCDC protein preparation software. After preparation, 18310 structures out of the total 19443 remained (1133 failed).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary data file S4 from the manuscript 'The application of the Open Pharmacological Concepts Triple Store (Open PHACTS) to support Drug Discovery Research' to be published in PLOS ONE
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Protein–protein interactions (PPIs) are crucial for understanding biological processes and disease mechanisms, contributing significantly to advances in protein engineering and drug discovery. The accurate determination of binding affinities, essential for decoding PPIs, faces challenges due to the substantial time and financial costs involved in experimental and theoretical methods. This situation underscores the urgent need for more effective and precise methodologies for predicting binding affinity. Despite the abundance of research on PPI modeling, the field of quantitative binding affinity prediction remains underexplored, mainly due to a lack of comprehensive data. This study seeks to address these needs by manually curating pairwise interaction labels on available 3D structures of protein complexes, with experimentally determined binding affinities, creating the largest data set for structure-based pairwise protein interaction with binding affinity to date. Subsequently, we introduce ProAffinity-GNN, a novel deep learning framework using protein language model and graph neural network (GNN) to improve the accuracy of prediction of structure-based protein–protein binding affinities. The evaluation results across several benchmark test sets and an additional case study demonstrate that ProAffinity-GNN not only outperforms existing models in terms of accuracy but also shows strong generalization capabilities.
Facebook
TwitterThe dataset used in the paper for predicting the effects of mutations on protein-protein binding.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of BiComp encoding, against LZMA and SW encodings, for drug-target binding affinity prediction, for Davis and Kiba datasets, using feature ablation experiments.
Facebook
TwitterThis dataset contains the predicted prices of the asset Affinity over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data
DOI: 10.5281/zenodo.16894086
Resource Type: Dataset
Publisher: Zenodo
Publication Year: 2025
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Overview (Abstract):
The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.
Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology
Methods (Data Curation and Processing):
The ANDD was constructed through a rigorous multi-step process:
Data Specifications and Format:
The dataset is distributed in two parts:
ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):
Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.Ab/Nano_mutation).Technical Validation:
The quality of ANDD has been ensured through extensive validation:
Potential Uses:
ANDD is designed to accelerate research in computational biology and drug discovery, including:
Access and License:
The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.
Facebook
TwitterDatabase of affinity data for protein-ligand complexes of the Protein Data Bank (PDB) providing direct and free access to the experimental affinity of a given complex structure. Affinity data are exclusively obtained from the scientific literature. As of Thursday, May 01st, 2014, AffinDB contains 748 affinity values covering 474 different PDB complexes. More than one affinity value may be associated with a single PDB complex, which is most frequently due to multiple references reporting affinity data for the same complex. AffinDB provides access to data in three different forms: # Summary information for PDB entry # Affinity information window # Tabular reports
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is curated from two widely used benchmarks—Davis and KIBA—for drug-target interaction (DTI) prediction tasks. It includes compound SMILES strings, target protein sequences, and corresponding binding affinity values.
It is ideal for developing and benchmarking deep learning models that combine molecular graph representations (from SMILES) and sequence-based encodings (from protein sequences).
davis_all.csv – pKd binding values between kinase inhibitors and protein targets.
kiba_all.csv – KIBA scores representing combined bioactivity data (Ki, Kd, IC50).
Each file contains the following columns:
Column Name Description - canonical_smiles Isomeric SMILES string representing the compound structure - target_sequence Amino acid sequence of the protein target - affinity Binding affinity value (e.g., pKd for Davis, KIBA score for KIBA)
Source: Davis et al., 2011
Affinity values are provided as pKd = −log10(Kd).
Focuses on kinase inhibitors and human kinase proteins.
Source: Tang et al., 2014
Combines multiple bioactivity types into a unified KIBA score.
Broader coverage of compounds and targets.
Deep learning-based DTI prediction (e.g., GraphDTA, DeepDTA, MolTrans)
Molecular representation learning (via GCN, SMILES encoding)
Protein sequence embedding and joint modeling
Drug discovery, repurposing, and virtual screening task
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global affinity analysis platform market size reached USD 1.87 billion in 2024, demonstrating robust momentum across sectors. With a projected CAGR of 13.2% during the forecast period, the market is anticipated to attain a value of USD 5.58 billion by 2033. This impressive growth is primarily attributed to increasing demand for advanced data analytics solutions, rising adoption of AI-driven customer insights, and the ongoing digital transformation across industries. As organizations strive to gain a competitive edge through data-driven decision-making, affinity analysis platforms are rapidly becoming indispensable tools for uncovering actionable patterns and optimizing business strategies.
A major growth factor propelling the affinity analysis platform market is the exponential increase in data generation from digital channels, IoT devices, and customer interactions. Organizations across retail, BFSI, healthcare, and e-commerce are leveraging affinity analysis to mine relationships and associations within large datasets, enabling them to understand customer behavior, preferences, and trends with unprecedented accuracy. This demand is further amplified by the proliferation of omnichannel strategies, where businesses seek to create seamless and personalized experiences for their customers. As a result, the need for sophisticated analytics tools capable of real-time processing and actionable insights has never been higher, driving continuous innovation and investment in affinity analysis technologies.
Another significant driver is the integration of artificial intelligence and machine learning algorithms within affinity analysis platforms. These technologies empower organizations to automate complex analytical processes, enhance the accuracy of predictions, and uncover hidden correlations that traditional methods might overlook. The ability to deliver highly targeted marketing campaigns, optimize product recommendations, and detect fraudulent activities in real time has become a key differentiator for businesses. Furthermore, advancements in cloud computing have democratized access to these platforms, allowing even small and medium enterprises to benefit from enterprise-grade analytics without heavy upfront investments in infrastructure.
The increasing regulatory focus on data privacy and security is also shaping the affinity analysis platform market. As data-driven strategies become central to business operations, organizations are under pressure to comply with stringent regulations such as GDPR, CCPA, and HIPAA. This has led to a surge in demand for platforms that offer robust security features, data governance capabilities, and compliance tools. Vendors are responding by enhancing their offerings with advanced encryption, access controls, and audit trails, thereby building trust and ensuring the responsible use of customer data. This regulatory landscape, while challenging, is also fostering innovation and driving adoption among risk-averse industries like healthcare and finance.
From a regional perspective, North America continues to dominate the affinity analysis platform market, accounting for the largest share owing to the early adoption of advanced analytics, presence of key technology providers, and high digital maturity of enterprises. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, booming e-commerce, and increasing investments in AI and big data. Europe remains a significant market, driven by stringent data protection regulations and a strong focus on customer-centric business models. Meanwhile, Latin America and the Middle East & Africa are witnessing steady growth, supported by expanding digital infrastructure and rising awareness of the benefits of affinity analysis.
The affinity analysis platform market by component is segmented into software and services, each playing a crucial role in delivering value to end-users. The software segment, which includes analytics engines, visualization tools, and data integration modules, holds the lion’s share of the market. This dominance is attributed to the continuous advancements in analytics algorithms, user-friendly interfaces, and integration capabilities with existing enterprise systems. Organizations are increasingly seeking scalable and customizable software solutions that can handle large vol
Facebook
TwitterBackground Proteins HMG1 and HMG2 are two of the most abundant non histone proteins in the nucleus of mammalian cells, and contain a domain of homology with many proteins implicated in the control of development, such as the sex-determination factor Sry and the Sox family of proteins. In vitro studies of interactions of HMG1/2 with DNA have shown that these proteins can bind to many unusual DNA structures, in particular to four-way junctions, with binding affinities of 107 to 109 M-1.
Results
Here we show that HMG1 and HMG2 bind with a much higher affinity, at least 4 orders of magnitude higher, to a new structure, Form X, which consists of a DNA loop closed at its base by a semicatenated DNA junction, forming a DNA hemicatenane. The binding constant of HMG1 to Form X is higher than 5 × 1012 M-1, and the half-life of the complex is longer than one hour in vitro.
Conclusions
Of all DNA structures described so far with which HMG1 and HMG2 interact, we have found that Form X, a DNA loop with a semicatenated DNA junction at its base, is the structure with the highest affinity by more than 4 orders of magnitude. This suggests that, if similar structures exist in the cell nucleus, one of the functions of these proteins might be linked to the remarkable property of DNA hemicatenanes to associate two distant regions of the genome in a stable but reversible manner.
Facebook
TwitterLigand Binding Affinity (LBA) dataset from the ATOM3D project. This upload includes five zipped data directories:
Facebook
TwitterDatabase of affinity data for protein-ligand complexes of the Protein Data Bank (PDB) providing direct and free access to the experimental affinity of a given complex structure. Affinity data are exclusively obtained from the scientific literature. As of Thursday, May 01st, 2014, AffinDB contains 748 affinity values covering 474 different PDB complexes. More than one affinity value may be associated with a single PDB complex, which is most frequently due to multiple references reporting affinity data for the same complex. AffinDB provides access to data in three different forms: # Summary information for PDB entry # Affinity information window # Tabular reports
Facebook
TwitterAffinity capture (AC) combined with mass spectrometry (MS)-based proteomics is highly utilized throughout the drug discovery pipeline to determine small-molecule target selectivity and engagement. However, the tedious sample preparation steps and time-consuming MS acquisition process have limited its use in a high-throughput format. Here, we report an automated workflow employing biotinylated probes and streptavidin magnetic beads for small-molecule target enrichment in the 96-well plate format, ending with direct sampling from EvoSep Solid Phase Extraction tips for liquid chromatography (LC)-tandem mass spectrometry (MS/MS) analysis. The streamlined process significantly reduced both the overall and hands-on time needed for sample preparation. Additionally, we developed a data-independent acquisition-mass spectrometry (DIA-MS) method to establish an efficient label-free quantitative chemical proteomic kinome profiling workflow. DIA-MS yielded a coverage of ∼380 kinases, a > 60% increase compared to using a data-dependent acquisition (DDA)-MS method, and provided reproducible target profiling of the kinase inhibitor dasatinib. We further showcased the applicability of this AC-MS workflow for assessing the selectivity of two clinical-stage CDK9 inhibitors against ∼250 probe-enriched kinases. Our study here provides a roadmap for efficient target engagement and selectivity profiling in native cell or tissue lysates using AC-MS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 65 verified Affinity Group locations in United States with complete contact information, ratings, reviews, and location data.
Facebook
TwitterA linear scale to estimate DNA-binding free energy of amino acid residues is reported. Scales derived exclusively for irregular and helical positions give 76% and 68% classification accuracy between stabilizing and destabilizing protein-DNA interaction. Mean absolute error (MAE) in ddG values is 0.786 and 0.883 kcal/mol respectively. Without using structure information of residues to derive affinity scales, 67.0% mutations could be correctly classified between those stabilizing and destabilizing binding. Mean absolute error (MAE) and correlation of ddG predictions are 0.953 kcal/mol and 0.385 respectively. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1 Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For fast reproduction of our results, we provide PyTorch datasets of precomputed interaction graphs for the entire PDBbind database on Zenodo. To enable quick establishment of leakage-free evaluation setups with PDBbind, we also provide pairwise similarity matrices for the entire PDBbind dataset on Zenodo.
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the change of PPB binding affinity upon mutation, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset concerning PPB affinity. Therefore, the current study aimed to introduce and disclose a PPB affinity dataset (PPB-Affinity), which will definitely benefit the development of applicable DL to predict the PPB affinity. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest and publicly available PPB-Affinity dataset, which may finally help the industry in improving the screening efficiency of discovering new large-molecular drugs.
Codes for PPB-Affinity database preparation is disclosed at https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow" href="https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow">https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow.
Codes for the benchmark algorithm is disclosed at https://github.com/ChenPy00/PPB-Affinity.
Files are orginized as follows:
- PDB/
- Affinity Benchmark v5.5/
- file1.pdb
- file2.pdb
- ...
- filek.pdb
- ATLAS/
- PDBbind v2020/
- SAbDab/
- SKEMPIv2.0/