2 datasets found

Data for: Advances and critical assessment of machine learning techniques...
zenodo.org
data.niaid.nih.gov
+1more
bin, csv
Updated Sep 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč (2023). Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg7
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zgmsbccg7
Dataset updated
Sep 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease M^pro (PDB ID: 6WQF).

Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 μM or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only set). Gasteiger charges were reassigned to the remaining compounds using OpenBabel. In addition, four in-vitro-only compounds with docking scores greater than 1 kcal/mol have been rejected.

The provided in-vivo and the in-vitro-only sets contain 59,884 (in-vivo.xyz) and 174,014 (in-vitro-only.xyz) compounds, respectively. Compounds in both sets contain the following elements: H, C, N, O, F, P, S, Cl, Br, and I. The in-vivo compound set was used as the primary data set for the training of the ML models in the referencing study.

The file in-vivo-splits-data.csv contains the exact composition of all (random) 80-5-15 train-validation-test splits used in the study, labeled I, II, III, IV, and V. Eight additional random subsets in each of the in-vivo 80-5-15 splits were created to monitor the training process convergence. These subsets were constructed in such a manner, that each subset contains all compounds from the previous subset (starting with the 10-5-15 subset) and was enlarged by one eighth of the entire (80-5-15) train set of a given split. These subsets are further referred to as in_vivo_10_(I, II, ..., V), in_vivo_20_(I, II, ..., V),..., in_vivo_80_(I, II, ... V).

Global Ai Development Platform Market Research Report: By Deployment...

wiseguyreports.com

Updated Jul 18, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Ai Development Platform Market Research Report: By Deployment (Cloud-based, On-premises), By Use Case (Natural Language Processing, Machine Learning, Deep Learning, Computer Vision, Predictive Analytics), By Vertical (Healthcare, Retail, Manufacturing, Banking and Financial Services, Transportation) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/es/reports/ai-development-platform-market

Explore at:

Dataset updated

Jul 18, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	6.24(USD Billion)
MARKET SIZE 2024	7.56(USD Billion)
MARKET SIZE 2032	35.0(USD Billion)
SEGMENTS COVERED	Deployment ,Use Case ,Vertical ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	1 Rising Demand for AIpowered Solutions 2 Adoption of Cloudbased AI Platforms 3 Growing Importance of Data Ownership and Security 4 Emergence of LowcodeNocode AI Development Tools 5 Increasing Competition among AI Platform Providers
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Tensorflow ,Domino Data Lab ,Keras ,Iguazio ,RapidMiner ,CatBoost ,LightGBM ,PyTorch ,DataRobot ,Scikitlearn ,XGBoost ,H2O.ai ,Algorithmia ,C3 ai ,Determined AI
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	1 AI Platform as a Service PaaS Adoption 2 Increasing Demand for Conversational AI 3 Rise of LowCodeNoCode AI Development 4 Integration with Cloud and Edge Computing 5 Growing Adoption in Healthcare and Finance
COMPOUND ANNUAL GROWTH RATE (CAGR)	21.12% (2024 - 2032)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč (2023). Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg7

Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.zgmsbccg7

Dataset updated

Sep 5, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease M^pro (PDB ID: 6WQF).

Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 μM or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only set). Gasteiger charges were reassigned to the remaining compounds using OpenBabel. In addition, four in-vitro-only compounds with docking scores greater than 1 kcal/mol have been rejected.

The provided in-vivo and the in-vitro-only sets contain 59,884 (in-vivo.xyz) and 174,014 (in-vitro-only.xyz) compounds, respectively. Compounds in both sets contain the following elements: H, C, N, O, F, P, S, Cl, Br, and I. The in-vivo compound set was used as the primary data set for the training of the ML models in the referencing study.

The file in-vivo-splits-data.csv contains the exact composition of all (random) 80-5-15 train-validation-test splits used in the study, labeled I, II, III, IV, and V. Eight additional random subsets in each of the in-vivo 80-5-15 splits were created to monitor the training process convergence. These subsets were constructed in such a manner, that each subset contains all compounds from the previous subset (starting with the 10-5-15 subset) and was enlarged by one eighth of the entire (80-5-15) train set of a given split. These subsets are further referred to as in_vivo_10_(I, II, ..., V), in_vivo_20_(I, II, ..., V),..., in_vivo_80_(I, II, ... V).

Clear search

Close search

Google apps

Main menu

Data for: Advances and critical assessment of machine learning techniques...

Global Ai Development Platform Market Research Report: By Deployment...

Data for: Advances and critical assessment of machine learning techniques for prediction of docking scoresSee More Versions

Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores