Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUD-E). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of protein-ligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of Vina, Gnina and Pafnucy performance on DUD-E targets.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Enrichment of ligands versus property-matched decoys is widely used to test and optimize docking library screens. However, the unconstrained optimization of enrichment alone can mislead, leading to false confidence in prospective performance. This can arise by over-optimizing for enrichment against property-matched decoys, without considering the full spectrum of molecules to be found in a true large library screen. Adding decoys representing charge extrema helps mitigate over-optimizing for electrostatic interactions. Adding decoys that represent the overall characteristics of the library to be docked allows one to sample molecules not represented by ligands and property-matched decoys but that one will encounter in a prospective screen. An optimized version of the DUD-E set (DUDE-Z), as well as Extrema and sets representing broad features of the library (Goldilocks), is developed here. We also explore the variability that one can encounter in enrichment calculations and how that can temper one’s confidence in small enrichment differences. The new tools and new decoy sets are freely available at http://tldr.docking.org and http://dudez.docking.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training and test datasets of the paper "Improving the versatility of deep learning-based protein-ligand interaction prediction for accurate binding affinity scoring and virtual screening".
Comparative evaluation of virtual screening methods requires a rigorous benchmarking procedure on diverse, realistic, and unbiased data sets. Recent investigations from numerous research groups unambiguously demonstrate that artificially constructed ligand sets classically used by the community (e.g., DUD, DUD-E, MUV) are unfortunately biased by both obvious and hidden chemical biases, therefore overestimating the true accuracy of virtual screening methods. We herewith present a novel data set (LIT-PCBA) specifically designed for virtual screening and machine learning. LIT-PCBA relies on 149 dose–response PubChem bioassays that were additionally processed to remove false positives and assay artifacts and keep active and inactive compounds within similar molecular property ranges. To ascertain that the data set is suited to both ligand-based and structure-based virtual screening, target sets were restricted to single protein targets for which at least one X-ray structure is available in complex with ligands of the same phenotype (e.g., inhibitor, inverse agonist) as that of the PubChem active compounds. Preliminary virtual screening on the 21 remaining target sets with state-of-the-art orthogonal methods (2D fingerprint similarity, 3D shape similarity, molecular docking) enabled us to select 15 target sets for which at least one of the three screening methods is able to enrich the top 1%-ranked compounds in true actives by at least a factor of 2. The corresponding ligand sets (training, validation) were finally unbiased by the recently described asymmetric validation embedding (AVE) procedure to afford the LIT-PCBA data set, consisting of 15 targets and 7844 confirmed active and 407,381 confirmed inactive compounds. The data set mimics experimental screening decks in terms of hit rate (ratio of active to inactive compounds) and potency distribution. It is available online at http://drugdesign.unistra.fr/LIT-PCBA for download and for benchmarking novel virtual screening methods, notably those relying on machine learning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains the benchmarking data obtained alongside the first version of DockM8.
The file structure is explained in DockM8_v1_file_structure_explanation.txt
We hope this data is useful for benchmarking scoring functions and machine learning models, as well as being a large repository of pre-docked poses using a variety of algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional digital data to "RASPD+: Fast protein-ligand binding free energy prediction using simplified physicochemical features" (ChemRxiv preprint:https://doi.org/10.26434/chemrxiv.12636704).
Associated code can be found at: https://github.com/HITS-MCM/RASPDplus
Files:
weights.tar.gz: contains the model weights of one random dataset split and its associated crossvalidation folds. Used for standard RASPD+ evaluation.
additional_model_replicates.tar.gz: contains the remaining models trained on the full set of descriptors.
external_test_sets.tar.gz: contains the descriptor tables for all external test sets used
dude.tar.gz: contains the descriptor tables for and several identifier lists for evaluation on the Directory of Useful Decoys - Enhanced (DUD-E)
run_outputs.tar.gz: Performance metric data and predicted values created during the model training and evaluation runs. Basis for the figures and metrics in the manuscript.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
This dataset provides information about the number of properties, residents, and average property values for Dude Hadley Road cross streets in Perdido, AL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ligand-only CNN models that achieved high AUC (greater than 0.9) for COMT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All of the individual docking results and ESSENCE-Dock consensus results for 21 diverse DUD-E targets as presented in the paper "ESSENCE-Dock: A Consensus-Based Approach to Enhance Virtual Screening Enrichment in Drug Discovery".
Docking calculations were perfomed using:
The consensus calculations were performed using ESSENCE-Dock, available via Metascreener.
ESSENCE-Dock preprint: https://doi.org/10.26434/chemrxiv-2023-21wtv
Paper Abstract
Developing new drugs is an expensive and lengthy endeavor, partly due to the reliance on high-throughput screening (HTS), which involves significant costs and is time-consuming. Virtual screening, particularly molecular docking, offers a more cost-effective and faster alternative for identifying promising drug candidates. However, the effectiveness of molecular docking can vary greatly, which has led to the use of consensus docking approaches. These approaches combine results from different docking methods to improve the identification of active compounds and can reduce the occurrence of false positives. However, many of these methods do not fully leverage the latest advancements in docking technology. In response, we present ESSENCE-Dock (Effective Structural Screening ENrichment ConsEnsus Dock), a new consensus docking workflow aimed at decreasing false positives and increasing the discovery of active compounds. By utilizing a combination of novel docking algorithms, we improve the selection process for potential active compounds. ESSENCE-Dock has been made to be user-friendly, requiring only a few simple commands to perform a complete screening, while also being designed for use in high-performance computing (HPC) environments.
This dataset provides information about the number of properties, residents, and average property values for Dude Waters Drive cross streets in Mulberry, FL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The mean and SD of the AUC values across three target groups.
ADMMR map collection: Dude Mining Claims, Claim Map; 1 in. to 200 feet; 22 x 17 in.
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
This compressed file contains all datasets made for the validation of MUBDsyn.
All these datasets can be used for the reproduction of validation performed in the manuscript or to benchmark various virtual screening methods.
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to dude.com (Domain). Get insights into ownership history and changes over time.
This dataset provides information on 67 in India as of March, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
This dataset provides information about the number of properties, residents, and average property values for Dude Street cross streets in Sullivan, IN.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dude Perfect Amerikalı internet içerik üreticisi spor ve komedi grubudur 19 Mart 2009 tarihinde kurulmuş grup hepsi
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects and is filtered where the books is Dude gun, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUD-E). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of protein-ligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development.