Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Scoring functions for protein−ligand docking have received much attention in the past two decades. In many cases, remarkable success has been demonstrated in predicting the correct geometry of interaction. On independent test sets, however, the predicted binding energies or scores correlate only slightly with the observed free energies of binding.In this study, we analyze how well free energies of binding can be predicted on the basis of crystal structures using traditional QSAR techniques in a proteochemometric approach. We introduce a new set of protein−ligand interaction descriptors on the basis of distance-binned Crippen-like atom type pairs. A subset of the publicly available PDBbind09-CN refined set (MW < 900 g/mol, #P < 2, ndon + nacc < 20; N = 1387) is being used as data set. It is demonstrated how simple, yet surprisingly good, scoring functions can be generated for the whole diverse database (R2out-of-bag = 0.48, Rp = 0.69, RMSE = 1.44, MUE = 1.14) and individual protein family subsets. This performance is significantly better than the performance of almost all other scoring functions published that have been validated on a test set as large and diverse as the PDBbind refined set.We also find that on some protein families surprisingly good scoring functions can be obtained using simple ligand-only descriptors like logS, logP, and molecular weight. The ligand−descriptor based scoring function equals or even outperforms commonly used scoring functions, highlighting the need for better scoring functions. We demonstrate how the observed performance depends on the validation strategy, and we outline a general validation protocol for future free energy scoring functions.
The merra2ools
dataset has been assembled through the following steps:
The MERRA-2 collections tavg1_2d_flx_Nx (Surface Flux Diagnostics), tavg1_2d_rad_Nx (Radiation Diagnostics), and tavg1_2d_slv_Nx (Single-level atmospheric state variables) downloaded from NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2) using GNU Wget network utility (https://disc.gsfc.nasa.gov/data-access). Every of the three collections consist of daily netCDF-4 files with 3-dimensional variables (lon x lat x hour).
The following variables obtained from the netCDF-4 files and merged into long-term time-series:
Northward (V) and Eastward (U) wind at 10 and 50 meters (V10M, V50M, U10M, U50M, respectively), and 10-meter air temperature (T10M) from the tavg1_2d_slv_Nx collection;
Incident shortwave land (SWGDN) and Surface albedo (ALBEDO) fro...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 6. R-input, Fig3c subset adjacency list. A subset of the MiRWalk_Trimmed.csv adjacency list, used to derive the graph plot displayed in Fig. 3c.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data subset showing the loadings of the liner discriminant 1 for only those experimental flakes made on silcrete.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Scoring functions for protein−ligand docking have received much attention in the past two decades. In many cases, remarkable success has been demonstrated in predicting the correct geometry of interaction. On independent test sets, however, the predicted binding energies or scores correlate only slightly with the observed free energies of binding.In this study, we analyze how well free energies of binding can be predicted on the basis of crystal structures using traditional QSAR techniques in a proteochemometric approach. We introduce a new set of protein−ligand interaction descriptors on the basis of distance-binned Crippen-like atom type pairs. A subset of the publicly available PDBbind09-CN refined set (MW < 900 g/mol, #P < 2, ndon + nacc < 20; N = 1387) is being used as data set. It is demonstrated how simple, yet surprisingly good, scoring functions can be generated for the whole diverse database (R2out-of-bag = 0.48, Rp = 0.69, RMSE = 1.44, MUE = 1.14) and individual protein family subsets. This performance is significantly better than the performance of almost all other scoring functions published that have been validated on a test set as large and diverse as the PDBbind refined set.We also find that on some protein families surprisingly good scoring functions can be obtained using simple ligand-only descriptors like logS, logP, and molecular weight. The ligand−descriptor based scoring function equals or even outperforms commonly used scoring functions, highlighting the need for better scoring functions. We demonstrate how the observed performance depends on the validation strategy, and we outline a general validation protocol for future free energy scoring functions.