100+ datasets found

Data from: PPB-Affinity: Protein-Protein Binding Affinity dataset for...
zenodo.org
bin, zip
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huaqing Liu; Huaqing Liu (2024). PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery [Dataset]. http://doi.org/10.5281/zenodo.13054646
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13054646
Dataset updated
Jul 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Huaqing Liu; Huaqing Liu
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the change of PPB binding affinity upon mutation, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset concerning PPB affinity. Therefore, the current study aimed to introduce and disclose a PPB affinity dataset (PPB-Affinity), which will definitely benefit the development of applicable DL to predict the PPB affinity. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest and publicly available PPB-Affinity dataset, which may finally help the industry in improving the screening efficiency of discovering new large-molecular drugs.

Codes for PPB-Affinity database preparation is disclosed at https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow" href="https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow">https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow.
Codes for the benchmark algorithm is disclosed at https://github.com/ChenPy00/PPB-Affinity.

Files are orginized as follows:

- PPB-Affinity.xlsx

- samples_deleted.zip

- PDB/

- Affinity Benchmark v5.5/

- file1.pdb

- file2.pdb

- ...

- filek.pdb

- ATLAS/

- PDBbind v2020/

- SAbDab/

- SKEMPIv2.0/
f
Data from: PPI-Affinity: A Web Tool for the Prediction and Optimization of...
datasetcatalog.nlm.nih.gov
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Münch, Jan; Mieres-Perez, Joel; Romero-Molina, Sandra; Sanchez-Garcia, Elsa; Ruiz-Blanco, Yasser B.; Ehrmann, Michael; Harms, Mirja (2022). PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein–Peptide and Protein–Protein Binding Affinity [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000438753
Explore at:
Dataset updated
Jun 2, 2022
Authors
Münch, Jan; Mieres-Perez, Joel; Romero-Molina, Sandra; Sanchez-Garcia, Elsa; Ruiz-Blanco, Yasser B.; Ehrmann, Michael; Harms, Mirja
Description
Virtual screening of protein–protein and protein–peptide interactions is a challenging task that directly impacts the processes of hit identification and hit-to-lead optimization in drug design projects involving peptide-based pharmaceuticals. Although several screening tools designed to predict the binding affinity of protein–protein complexes have been proposed, methods specifically developed to predict protein–peptide binding affinity are comparatively scarce. Frequently, predictors trained to score the affinity of small molecules are used for peptides indistinctively, despite the larger complexity and heterogeneity of interactions rendered by peptide binders. To address this issue, we introduce PPI-Affinity, a tool that leverages support vector machine (SVM) predictors of binding affinity to screen datasets of protein–protein and protein–peptide complexes, as well as to generate and rank mutants of a given structure. The performance of the SVM models was assessed on four benchmark datasets, which include protein–protein and protein–peptide binding affinity data. In addition, we evaluated our model on a set of mutants of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.
Spatio-temporal learning from molecular dynamics simulations for...
zenodo.org
zip
Updated Aug 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre-Yves Libouban; Pierre-Yves Libouban; Camille Parisel; Maxime Song; Samia Aci-Sèche; Samia Aci-Sèche; Jose-Carlos Gómez-Tamayo; Gary Tresadern; Gary Tresadern; Pascal Bonnet; Pascal Bonnet; Camille Parisel; Maxime Song; Jose-Carlos Gómez-Tamayo (2025). Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction [Dataset]. http://doi.org/10.5281/zenodo.10390550
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10390550
Dataset updated
Aug 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pierre-Yves Libouban; Pierre-Yves Libouban; Camille Parisel; Maxime Song; Samia Aci-Sèche; Samia Aci-Sèche; Jose-Carlos Gómez-Tamayo; Gary Tresadern; Gary Tresadern; Pascal Bonnet; Pascal Bonnet; Camille Parisel; Maxime Song; Jose-Carlos Gómez-Tamayo
License
https://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdfhttps://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdf
Time period covered
Jun 6, 2024
Description
This Zenodo repository provides comprehensive resources for the paper titled "Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction" published on Bioinformatics. We created a dataset of 63,000 molecular dynamics simulations by performing 10 simulations of 10 ns on 6,300 complexes. Neural networks were developed to learn from this data in order to predict the binding affinities of protein-ligand complexes. The implementation of these neural networks are available on github. Our collection includes training/benchmark datasets, trained statistical models, and results on test sets (CSV & PDF files).

Training/benchmark datasets:

Training, validation and test sets are provided to train and evaluate the following neural networks:

Pafnucy, Proli and Densenucy without MD data augmentation (dataset file names contain "initial")

Pafnucy, Proli and Densenucy with MD data augmentation (dataset file names contain "MDDA")

Pafnucy with/without MD data augmentation and Proli and Densenucy with MD data augmentation were also evaluated on the fep test set (test set file name contain "fep")

Timenucy and Videonucy using spatiotemporal learning methods (dataset file names contain "4D")

Pafnucy without MD data augmentation and on a reduced training set (dataset file names contain "reduced")

For each training methodology (MD data augmentation and spatiotemporal learning), we provide the data for the whole complex, only the ligand or only the protein. Additionally for spatiotemporal learning, we provide the data with only the ligand using the tracking mode.

Statistical models:

We provide the models trained with Pafnucy, Proli, Densenucy, Timenucy and Videonucy. Each models were trained in 10 replicates.

For Pafnucy, Proli, Densenucy, we provide the models trained with random and systematic rotations, as well as with or without MD data augmentation.

For Proli, Densenucy, Timenucy and Videonucy, we provide the models trained on the whole complex, only the ligand or only the protein.

For Pafnucy we also provide the models trained on the reduced set (5932 complexes).

Results on test sets (CSV & PDF files):

We provide the predictions on the PDBbind v.2016 core set.

For spatiotemporal learning methods (Timenucy and Videonucy), there are predictions for only 83 complexes, as we did not perform simulations on the whole test set.

For models trained with MD DA, predictions were carried on the crystallographic structures as well as on the frames extracted from the simulations performed on the test set (augmented test).

Results on the FEP dataset are also provided for Pafnucy, Proli and Densenucy.

The Raw MD data (~4.5 To) are stored, and can be visualized/downloaded, on the MDDB.

This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-A0100712496 & 2022-AD011013521) and CRIANN (Grant 2021002).
Data from: Improving generalisability of 3D binding affinity models in low...
zenodo.org
bin, csv, txt, zip
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ward Haddadin; Julia Buhmann; Alan Bilsland; Lukáš Pravda; Hagen Triendl; Ward Haddadin; Julia Buhmann; Alan Bilsland; Lukáš Pravda; Hagen Triendl (2024). Improving generalisability of 3D binding affinity models in low data regimes [Dataset]. http://doi.org/10.5281/zenodo.14054484
Explore at:
zip, csv, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14054484
Dataset updated
Nov 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ward Haddadin; Julia Buhmann; Alan Bilsland; Lukáš Pravda; Hagen Triendl; Ward Haddadin; Julia Buhmann; Alan Bilsland; Lukáš Pravda; Hagen Triendl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structures of the PDBBind dataset (general protein-ligand) prepared with CCDC protein preparation software. After preparation, 18310 structures out of the total 19443 remained (1133 failed).
Compounds with binding affinity data for human DBP
figshare.com
xls
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Chichester (2016). Compounds with binding affinity data for human DBP [Dataset]. http://doi.org/10.6084/m9.figshare.1235442.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1235442.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Christine Chichester
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary data file S4 from the manuscript 'The application of the Open Pharmacological Concepts Triple Store (Open PHACTS) to support Drug Discovery Research' to be published in PLOS ONE
f
Data from: ProAffinity-GNN: A Novel Approach to Structure-Based...
acs.figshare.com
xlsx
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiyuan Zhou; Yueming Yin; Hao Han; Yiping Jia; Jun Hong Koh; Adams Wai-Kin Kong; Yuguang Mu (2024). ProAffinity-GNN: A Novel Approach to Structure-Based Protein–Protein Binding Affinity Prediction via a Curated Data Set and Graph Neural Networks [Dataset]. http://doi.org/10.1021/acs.jcim.4c01850.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.4c01850.s002
Dataset updated
Nov 19, 2024
Dataset provided by
ACS Publications
Authors
Zhiyuan Zhou; Yueming Yin; Hao Han; Yiping Jia; Jun Hong Koh; Adams Wai-Kin Kong; Yuguang Mu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Protein–protein interactions (PPIs) are crucial for understanding biological processes and disease mechanisms, contributing significantly to advances in protein engineering and drug discovery. The accurate determination of binding affinities, essential for decoding PPIs, faces challenges due to the substantial time and financial costs involved in experimental and theoretical methods. This situation underscores the urgent need for more effective and precise methodologies for predicting binding affinity. Despite the abundance of research on PPI modeling, the field of quantitative binding affinity prediction remains underexplored, mainly due to a lack of comprehensive data. This study seeks to address these needs by manually curating pairwise interaction labels on available 3D structures of protein complexes, with experimentally determined binding affinities, creating the largest data set for structure-based pairwise protein interaction with binding affinity to date. Subsequently, we introduce ProAffinity-GNN, a novel deep learning framework using protein language model and graph neural network (GNN) to improve the accuracy of prediction of structure-based protein–protein binding affinities. The evaluation results across several benchmark test sets and an additional case study demonstrate that ProAffinity-GNN not only outperforms existing models in terms of accuracy but also shows strong generalization capabilities.
r
SARS-CoV-2 RBD binding affinity dataset
resodate.org
Updated Jan 7, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiwei Liu; Tian Zhu; Milong Ren; Chungong Yu; Dongbo Bu; Haicang Zhang (2026). SARS-CoV-2 RBD binding affinity dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvc2Fycy1jb3YtMi1yYmQtYmluZGluZy1hZmZpbml0eS1kYXRhc2V0
Explore at:
Dataset updated
Jan 7, 2026
Dataset provided by
Leibniz Data Manager
Authors
Shiwei Liu; Tian Zhu; Milong Ren; Chungong Yu; Dongbo Bu; Haicang Zhang
Description
The dataset used in the paper for predicting the effects of mutations on protein-protein binding.
Performance comparison of BiComp encoding, against LZMA and SW encodings,...
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmood Kalemati; Mojtaba Zamani Emani; Somayyeh Koohi (2023). Performance comparison of BiComp encoding, against LZMA and SW encodings, for drug-target binding affinity prediction, for Davis and Kiba datasets, using feature ablation experiments. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011036.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011036.t009
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mahmood Kalemati; Mojtaba Zamani Emani; Somayyeh Koohi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison of BiComp encoding, against LZMA and SW encodings, for drug-target binding affinity prediction, for Davis and Kiba datasets, using feature ablation experiments.
c
Affinity Price Prediction Data
coinbase.com
Updated Dec 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Affinity Price Prediction Data [Dataset]. https://www.coinbase.com/en-ca/price-prediction/safeaffinity
Explore at:
Dataset updated
Dec 26, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset Affinity over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Antibody and Nanobody Design Dataset (ANDD)
zenodo.org
zip
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yikai Wu; Yikai Wu (2025). Antibody and Nanobody Design Dataset (ANDD) [Dataset]. http://doi.org/10.5281/zenodo.16894086
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16894086
Dataset updated
Sep 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yikai Wu; Yikai Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data

DOI: 10.5281/zenodo.16894086

Resource Type: Dataset

Publisher: Zenodo

Publication Year: 2025

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Overview (Abstract):

The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.

Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology

Methods (Data Curation and Processing):

The ANDD was constructed through a rigorous multi-step process:

Data Collection: Data was aggregated from 15 primary sources, including both antibody/nanobody-specific databases (e.g., OAS, SAbDab, INDI, sdAb-DB) and general protein databases (e.g., PDB, UNIPROT, PDBbind).

Integration and Standardization: Data from disparate sources was consolidated into a consistent format, addressing challenges of format inconsistency. Entries were manually validated to exclude non-relevant data (e.g., T-cell receptors).

Affinity Data Augmentation: The ANTIPASTI deep learning model was used to predict and add binding affinity values for entries that had structural data but lacked experimental affinity measurements.

Manual Curation: Web-based data and information from publicly available patents targeting key antigens (HER2, IL-6, CD45, SARS-CoV-2 RBD) were manually extracted to enhance completeness.

Hierarchical Organization: Data is organized in a hierarchical structure, offering four progressively detailed levels: Sequence-only, Sequence+Structure, Sequence+Structure+Antigen, and Sequence+Structure+Antigen+Affinity.

Data Specifications and Format:

The dataset is distributed in two parts:

ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.

All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.

The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):

General Info: Source, Update_Date, PDB_ID, Experimental_Method, Ab_or_Nano, Source_Organism.

Chain Details: Entity IDs, Asym IDs, Database Accession Codes, and Macromolecule Names for Heavy (H) and Light (L) chains.

Antigen Details: Ag_Name, Ag_Seq, Ag_Source Organism, and relevant database identifiers.

Sequence Data: Full amino acid sequences for H/L chains and individual CDR regions (H1-H3, L1-L3).

Affinity Data: Experimentally measured or predicted Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.

Mutation Data: Annotation of any amino acid mutations (Ab/Nano_mutation).

Technical Validation:

The quality of ANDD has been ensured through extensive validation:

Manual Curation: A rigorous manual review process was conducted to check for accuracy and consistency between sequence, structure, and affinity data across randomly selected entries.

Affinity Validation with AlphaBind: The experimental Kd values were validated by comparing them against enrichment ratios predicted by the AlphaBind model, showing a significant correlation (Pearson’s r = 0.750).

Cross-Mapping Validation: The internal consistency between Kd and ∆Gbinding values within the dataset was confirmed, showing a perfect correlation (Pearson’s r = 1.000) as per thermodynamic principles.

Proof-of-Concept Application: The dataset's utility was demonstrated by fine-tuning the Diffab generative model on a subset of ANDD. The fine-tuned model showed significant improvements in generating nanobodies with better predicted binding affinity, structural diversity, and developability metrics.

Potential Uses:

ANDD is designed to accelerate research in computational biology and drug discovery, including:

Training and benchmarking deep learning models for de novoantibody/nanobody sequence and structure generation.

Developing and validating predictive models for antibody-antigen binding affinity.

Studying structure-function relationships in antibody-antigen interactions.

Facilitating the design of optimized therapeutic antibodies and nanobodies with improved specificity and efficacy.

Access and License:

The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.
r
AffinDB
rrid.site
Updated Jan 21, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2026). AffinDB [Dataset]. http://identifiers.org/RRID:SCR_001690
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001690
Dataset updated
Jan 21, 2026
Description
Database of affinity data for protein-ligand complexes of the Protein Data Bank (PDB) providing direct and free access to the experimental affinity of a given complex structure. Affinity data are exclusively obtained from the scientific literature. As of Thursday, May 01st, 2014, AffinDB contains 748 affinity values covering 474 different PDB complexes. More than one affinity value may be associated with a single PDB complex, which is most frequently due to multiple references reporting affinity data for the same complex. AffinDB provides access to data in three different forms: # Summary information for PDB entry # Affinity information window # Tabular reports
Davis and KIBA Datasets
kaggle.com
zip
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raj Aryan (2025). Davis and KIBA Datasets [Dataset]. https://www.kaggle.com/datasets/rajaryan2315/davis-and-kiba-datasets
Explore at:
zip(53365041 bytes)Available download formats
Dataset updated
Jul 6, 2025
Authors
Raj Aryan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset is curated from two widely used benchmarks—Davis and KIBA—for drug-target interaction (DTI) prediction tasks. It includes compound SMILES strings, target protein sequences, and corresponding binding affinity values.

It is ideal for developing and benchmarking deep learning models that combine molecular graph representations (from SMILES) and sequence-based encodings (from protein sequences).

Files Included

davis_all.csv – pKd binding values between kinase inhibitors and protein targets.

kiba_all.csv – KIBA scores representing combined bioactivity data (Ki, Kd, IC50).

Data Columns

Each file contains the following columns:

Column Name Description - canonical_smiles Isomeric SMILES string representing the compound structure - target_sequence Amino acid sequence of the protein target - affinity Binding affinity value (e.g., pKd for Davis, KIBA score for KIBA)

Dataset Summary

Davis Dataset

Source: Davis et al., 2011

Affinity values are provided as pKd = −log10(Kd).

Focuses on kinase inhibitors and human kinase proteins.

KIBA Dataset

Source: Tang et al., 2014

Combines multiple bioactivity types into a unified KIBA score.

Broader coverage of compounds and targets.

Applications

Deep learning-based DTI prediction (e.g., GraphDTA, DeepDTA, MolTrans)

Molecular representation learning (via GCN, SMILES encoding)

Protein sequence embedding and joint modeling

Drug discovery, repurposing, and virtual screening task
D
Affinity Analysis Platform Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Affinity Analysis Platform Market Research Report 2033 [Dataset]. https://dataintelo.com/report/affinity-analysis-platform-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2025 - 2034
Area covered
Global
Description
Affinity Analysis Platform Market Outlook

According to our latest research, the global affinity analysis platform market size reached USD 1.87 billion in 2024, demonstrating robust momentum across sectors. With a projected CAGR of 13.2% during the forecast period, the market is anticipated to attain a value of USD 5.58 billion by 2033. This impressive growth is primarily attributed to increasing demand for advanced data analytics solutions, rising adoption of AI-driven customer insights, and the ongoing digital transformation across industries. As organizations strive to gain a competitive edge through data-driven decision-making, affinity analysis platforms are rapidly becoming indispensable tools for uncovering actionable patterns and optimizing business strategies.

A major growth factor propelling the affinity analysis platform market is the exponential increase in data generation from digital channels, IoT devices, and customer interactions. Organizations across retail, BFSI, healthcare, and e-commerce are leveraging affinity analysis to mine relationships and associations within large datasets, enabling them to understand customer behavior, preferences, and trends with unprecedented accuracy. This demand is further amplified by the proliferation of omnichannel strategies, where businesses seek to create seamless and personalized experiences for their customers. As a result, the need for sophisticated analytics tools capable of real-time processing and actionable insights has never been higher, driving continuous innovation and investment in affinity analysis technologies.

Another significant driver is the integration of artificial intelligence and machine learning algorithms within affinity analysis platforms. These technologies empower organizations to automate complex analytical processes, enhance the accuracy of predictions, and uncover hidden correlations that traditional methods might overlook. The ability to deliver highly targeted marketing campaigns, optimize product recommendations, and detect fraudulent activities in real time has become a key differentiator for businesses. Furthermore, advancements in cloud computing have democratized access to these platforms, allowing even small and medium enterprises to benefit from enterprise-grade analytics without heavy upfront investments in infrastructure.

The increasing regulatory focus on data privacy and security is also shaping the affinity analysis platform market. As data-driven strategies become central to business operations, organizations are under pressure to comply with stringent regulations such as GDPR, CCPA, and HIPAA. This has led to a surge in demand for platforms that offer robust security features, data governance capabilities, and compliance tools. Vendors are responding by enhancing their offerings with advanced encryption, access controls, and audit trails, thereby building trust and ensuring the responsible use of customer data. This regulatory landscape, while challenging, is also fostering innovation and driving adoption among risk-averse industries like healthcare and finance.

From a regional perspective, North America continues to dominate the affinity analysis platform market, accounting for the largest share owing to the early adoption of advanced analytics, presence of key technology providers, and high digital maturity of enterprises. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, booming e-commerce, and increasing investments in AI and big data. Europe remains a significant market, driven by stringent data protection regulations and a strong focus on customer-centric business models. Meanwhile, Latin America and the Middle East & Africa are witnessing steady growth, supported by expanding digital infrastructure and rising awareness of the benefits of affinity analysis.

Component Analysis

The affinity analysis platform market by component is segmented into software and services, each playing a crucial role in delivering value to end-users. The software segment, which includes analytics engines, visualization tools, and data integration modules, holds the lion’s share of the market. This dominance is attributed to the continuous advancements in analytics algorithms, user-friendly interfaces, and integration capabilities with existing enterprise systems. Organizations are increasingly seeking scalable and customizable software solutions that can handle large vol
V
Data from: High affinity binding of proteins HMG1 and HMG2 to semicatenated...
data.virginia.gov
html
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). High affinity binding of proteins HMG1 and HMG2 to semicatenated DNA loops [Dataset]. https://data.virginia.gov/dataset/high-affinity-binding-of-proteins-hmg1-and-hmg2-to-semicatenated-dna-loops
Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Background Proteins HMG1 and HMG2 are two of the most abundant non histone proteins in the nucleus of mammalian cells, and contain a domain of homology with many proteins implicated in the control of development, such as the sex-determination factor Sry and the Sox family of proteins. In vitro studies of interactions of HMG1/2 with DNA have shown that these proteins can bind to many unusual DNA structures, in particular to four-way junctions, with binding affinities of 107 to 109 M-1.

Results Here we show that HMG1 and HMG2 bind with a much higher affinity, at least 4 orders of magnitude higher, to a new structure, Form X, which consists of a DNA loop closed at its base by a semicatenated DNA junction, forming a DNA hemicatenane. The binding constant of HMG1 to Form X is higher than 5 × 1012 M-1, and the half-life of the complex is longer than one hour in vitro. Conclusions Of all DNA structures described so far with which HMG1 and HMG2 interact, we have found that Form X, a DNA loop with a semicatenated DNA junction at its base, is the structure with the highest affinity by more than 4 orders of magnitude. This suggests that, if similar structures exist in the cell nucleus, one of the functions of these proteins might be linked to the remarkable property of DNA hemicatenanes to associate two distant regions of the genome in a stable but reversible manner.
ATOM3D: Ligand Binding Affinity (LBA) Dataset
zenodo.org
application/gzip, bin
Updated Jun 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael J.L. Townshend; Raphael J.L. Townshend; Martin Vögele; Martin Vögele; Patricia Suriana; Patricia Suriana; Alexander Derry; Alexander Derry; Alexander Powers; Yianni Laloudakis; Sidhika Balachandar; Brandon Anderson; Stephan Eismann; Risi Kondor; Russ B. Altman; Ron O. Dror; Alexander Powers; Yianni Laloudakis; Sidhika Balachandar; Brandon Anderson; Stephan Eismann; Risi Kondor; Russ B. Altman; Ron O. Dror (2021). ATOM3D: Ligand Binding Affinity (LBA) Dataset [Dataset]. http://doi.org/10.5281/zenodo.4914718
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4914718
Dataset updated
Jun 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Raphael J.L. Townshend; Raphael J.L. Townshend; Martin Vögele; Martin Vögele; Patricia Suriana; Patricia Suriana; Alexander Derry; Alexander Derry; Alexander Powers; Yianni Laloudakis; Sidhika Balachandar; Brandon Anderson; Stephan Eismann; Risi Kondor; Russ B. Altman; Ron O. Dror; Alexander Powers; Yianni Laloudakis; Sidhika Balachandar; Brandon Anderson; Stephan Eismann; Risi Kondor; Russ B. Altman; Ron O. Dror
Description
Ligand Binding Affinity (LBA) dataset from the ATOM3D project. This upload includes five zipped data directories:

Full, unsplit dataset in LMDB format

Split datasets at 60% sequence identity, with each in LMDB format

Split datasets at 30% sequence identity, with each in LMDB format

Text files containing train, validation, and test indices used to split raw dataset (for both 30% and 60%)

README containing dataset details
n
AffinDB
neuinfo.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). AffinDB [Dataset]. http://identifiers.org/RRID:SCR_001690
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001690 https://identifiers.org/RRID:SCR_001690/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
Database of affinity data for protein-ligand complexes of the Protein Data Bank (PDB) providing direct and free access to the experimental affinity of a given complex structure. Affinity data are exclusively obtained from the scientific literature. As of Thursday, May 01st, 2014, AffinDB contains 748 affinity values covering 474 different PDB complexes. More than one affinity value may be associated with a single PDB complex, which is most frequently due to multiple references reporting affinity data for the same complex. AffinDB provides access to data in three different forms: # Summary information for PDB entry # Affinity information window # Tabular reports
f
Data from: Automated High-Throughput Affinity Capture-Mass Spectrometry...
datasetcatalog.nlm.nih.gov
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Williams, Jon D.; Kath, James E.; Marin, Violeta L.; Ma, Renze; Banlasan, Adam; Tang, Hua; Torrent, Maricel; Jing, Hui; Senaweera, Sameera; Potts, Gregory K.; Richardson, Paul L.; Patel, Shitalben; McClure, Ryan A. (2025). Automated High-Throughput Affinity Capture-Mass Spectrometry Platform with Data-Independent Acquisition [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001448713
Explore at:
Dataset updated
Jan 27, 2025
Authors
Williams, Jon D.; Kath, James E.; Marin, Violeta L.; Ma, Renze; Banlasan, Adam; Tang, Hua; Torrent, Maricel; Jing, Hui; Senaweera, Sameera; Potts, Gregory K.; Richardson, Paul L.; Patel, Shitalben; McClure, Ryan A.
Description
Affinity capture (AC) combined with mass spectrometry (MS)-based proteomics is highly utilized throughout the drug discovery pipeline to determine small-molecule target selectivity and engagement. However, the tedious sample preparation steps and time-consuming MS acquisition process have limited its use in a high-throughput format. Here, we report an automated workflow employing biotinylated probes and streptavidin magnetic beads for small-molecule target enrichment in the 96-well plate format, ending with direct sampling from EvoSep Solid Phase Extraction tips for liquid chromatography (LC)-tandem mass spectrometry (MS/MS) analysis. The streamlined process significantly reduced both the overall and hands-on time needed for sample preparation. Additionally, we developed a data-independent acquisition-mass spectrometry (DIA-MS) method to establish an efficient label-free quantitative chemical proteomic kinome profiling workflow. DIA-MS yielded a coverage of ∼380 kinases, a > 60% increase compared to using a data-dependent acquisition (DDA)-MS method, and provided reproducible target profiling of the kinase inhibitor dasatinib. We further showcased the applicability of this AC-MS workflow for assessing the selectivity of two clinical-stage CDK9 inhibitors against ∼250 probe-enriched kinases. Our study here provides a roadmap for efficient target engagement and selectivity profiling in native cell or tissue lysates using AC-MS.
p
Affinity Group Locations Data for United States
poidata.io
csv, json
Updated Feb 12, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business Data Provider (2026). Affinity Group Locations Data for United States [Dataset]. https://poidata.io/brand-report/affinity-group/united-states
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 12, 2026
Dataset authored and provided by
Business Data Provider
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2026
Area covered
United States
Variables measured
Website URL, Phone Number, Review Count, Business Name, Email Address, Business Hours, Customer Rating, Business Address, Brand Affiliation, Geographic Coordinates
Description
Comprehensive dataset containing 65 verified Affinity Group locations in United States with complete contact information, ratings, reviews, and location data.
f
Data from: A single-residue affinity scale for DNA-binding using linear...
datasetcatalog.nlm.nih.gov
Updated Nov 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad, Shandar; Andrabi, Munazah (2017). A single-residue affinity scale for DNA-binding using linear perceptron [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001745785
Explore at:
Dataset updated
Nov 21, 2017
Authors
Ahmad, Shandar; Andrabi, Munazah
Description
A linear scale to estimate DNA-binding free energy of amino acid residues is reported. Scales derived exclusively for irregular and helical positions give 76% and 68% classification accuracy between stabilizing and destabilizing protein-DNA interaction. Mean absolute error (MAE) in ddG values is 0.786 and 0.883 kcal/mol respectively. Without using structure information of residues to derive affinity scales, 67.0% mutations could be correctly classified between those stabilizing and destabilizing binding. Mean absolute error (MAE) and correlation of ddG predictions are 0.953 kcal/mol and 0.385 respectively. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1 Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction...
zenodo.org
application/gzip, bin +1
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Stockinger; Peter Stockinger (2024). GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration [Dataset]. http://doi.org/10.5281/zenodo.14260171
Explore at:
json, application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14260171
Dataset updated
Dec 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Stockinger; Peter Stockinger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For fast reproduction of our results, we provide PyTorch datasets of precomputed interaction graphs for the entire PDBbind database on Zenodo. To enable quick establishment of leakage-free evaluation setups with PDBbind, we also provide pairwise similarity matrices for the entire PDBbind dataset on Zenodo.

Facebook

Twitter

Click to copy link

Link copied

Cite

Huaqing Liu; Huaqing Liu (2024). PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery [Dataset]. http://doi.org/10.5281/zenodo.13054646

Data from: PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery

Explore at:

zip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13054646

Dataset updated

Jul 27, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Huaqing Liu; Huaqing Liu

License

http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

Description

Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the change of PPB binding affinity upon mutation, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset concerning PPB affinity. Therefore, the current study aimed to introduce and disclose a PPB affinity dataset (PPB-Affinity), which will definitely benefit the development of applicable DL to predict the PPB affinity. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest and publicly available PPB-Affinity dataset, which may finally help the industry in improving the screening efficiency of discovering new large-molecular drugs.

Codes for PPB-Affinity database preparation is disclosed at https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow" href="https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow">https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow.
Codes for the benchmark algorithm is disclosed at https://github.com/ChenPy00/PPB-Affinity.

Files are orginized as follows:

- PPB-Affinity.xlsx

- samples_deleted.zip

- PDB/

- Affinity Benchmark v5.5/

- file1.pdb

- file2.pdb

- ...

- filek.pdb

- ATLAS/

- PDBbind v2020/

- SAbDab/

- SKEMPIv2.0/

Clear search

Close search

Google apps

Main menu

Data from: PPB-Affinity: Protein-Protein Binding Affinity dataset for...

Data from: PPI-Affinity: A Web Tool for the Prediction and Optimization of...

Spatio-temporal learning from molecular dynamics simulations for...

Data from: Improving generalisability of 3D binding affinity models in low...

Compounds with binding affinity data for human DBP

Data from: ProAffinity-GNN: A Novel Approach to Structure-Based...

SARS-CoV-2 RBD binding affinity dataset

Performance comparison of BiComp encoding, against LZMA and SW encodings,...

Affinity Price Prediction Data

Antibody and Nanobody Design Dataset (ANDD)

AffinDB

Davis and KIBA Datasets

Files Included

Data Columns

Dataset Summary

Davis Dataset

KIBA Dataset

Applications

Affinity Analysis Platform Market Research Report 2033

Affinity Analysis Platform Market Outlook

Component Analysis

Data from: High affinity binding of proteins HMG1 and HMG2 to semicatenated...

ATOM3D: Ligand Binding Affinity (LBA) Dataset

AffinDB

Data from: Automated High-Throughput Affinity Capture-Mass Spectrometry...

Affinity Group Locations Data for United States

Data from: A single-residue affinity scale for DNA-binding using linear...

GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction...

Data from: PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery