Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Scoring functions for protein−ligand docking have received much attention in the past two decades. In many cases, remarkable success has been demonstrated in predicting the correct geometry of interaction. On independent test sets, however, the predicted binding energies or scores correlate only slightly with the observed free energies of binding.In this study, we analyze how well free energies of binding can be predicted on the basis of crystal structures using traditional QSAR techniques in a proteochemometric approach. We introduce a new set of protein−ligand interaction descriptors on the basis of distance-binned Crippen-like atom type pairs. A subset of the publicly available PDBbind09-CN refined set (MW < 900 g/mol, #P < 2, ndon + nacc < 20; N = 1387) is being used as data set. It is demonstrated how simple, yet surprisingly good, scoring functions can be generated for the whole diverse database (R2out-of-bag = 0.48, Rp = 0.69, RMSE = 1.44, MUE = 1.14) and individual protein family subsets. This performance is significantly better than the performance of almost all other scoring functions published that have been validated on a test set as large and diverse as the PDBbind refined set.We also find that on some protein families surprisingly good scoring functions can be obtained using simple ligand-only descriptors like logS, logP, and molecular weight. The ligand−descriptor based scoring function equals or even outperforms commonly used scoring functions, highlighting the need for better scoring functions. We demonstrate how the observed performance depends on the validation strategy, and we outline a general validation protocol for future free energy scoring functions.
Facebook
TwitterReusable core functions with general utility for generic TK models. (Note, this table only lists a subset of core functions available in the httk R package and is limited to functions explicitly mentioned in this paper. We refer readers to the S2 File and help files for further details on these and other available functions.).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 6. R-input, Fig3c subset adjacency list. A subset of the MiRWalk_Trimmed.csv adjacency list, used to derive the graph plot displayed in Fig. 3c.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code for functions and Subset-BRD data analysis. (R)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data subset showing the loadings of the liner discriminant 1 for only those experimental flakes made on silcrete.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Scoring functions for protein−ligand docking have received much attention in the past two decades. In many cases, remarkable success has been demonstrated in predicting the correct geometry of interaction. On independent test sets, however, the predicted binding energies or scores correlate only slightly with the observed free energies of binding.In this study, we analyze how well free energies of binding can be predicted on the basis of crystal structures using traditional QSAR techniques in a proteochemometric approach. We introduce a new set of protein−ligand interaction descriptors on the basis of distance-binned Crippen-like atom type pairs. A subset of the publicly available PDBbind09-CN refined set (MW < 900 g/mol, #P < 2, ndon + nacc < 20; N = 1387) is being used as data set. It is demonstrated how simple, yet surprisingly good, scoring functions can be generated for the whole diverse database (R2out-of-bag = 0.48, Rp = 0.69, RMSE = 1.44, MUE = 1.14) and individual protein family subsets. This performance is significantly better than the performance of almost all other scoring functions published that have been validated on a test set as large and diverse as the PDBbind refined set.We also find that on some protein families surprisingly good scoring functions can be obtained using simple ligand-only descriptors like logS, logP, and molecular weight. The ligand−descriptor based scoring function equals or even outperforms commonly used scoring functions, highlighting the need for better scoring functions. We demonstrate how the observed performance depends on the validation strategy, and we outline a general validation protocol for future free energy scoring functions.