2 datasets found
  1. Data for: Advances and critical assessment of machine learning techniques...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, csv
    Updated Sep 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč (2023). Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg7
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Sep 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease Mpro (PDB ID: 6WQF).

    Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 μM or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only set). Gasteiger charges were reassigned to the remaining compounds using OpenBabel. In addition, four in-vitro-only compounds with docking scores greater than 1 kcal/mol have been rejected.

    The provided in-vivo and the in-vitro-only sets contain 59,884 (in-vivo.xyz) and 174,014 (in-vitro-only.xyz) compounds, respectively. Compounds in both sets contain the following elements: H, C, N, O, F, P, S, Cl, Br, and I. The in-vivo compound set was used as the primary data set for the training of the ML models in the referencing study.

    The file in-vivo-splits-data.csv contains the exact composition of all (random) 80-5-15 train-validation-test splits used in the study, labeled I, II, III, IV, and V. Eight additional random subsets in each of the in-vivo 80-5-15 splits were created to monitor the training process convergence. These subsets were constructed in such a manner, that each subset contains all compounds from the previous subset (starting with the 10-5-15 subset) and was enlarged by one eighth of the entire (80-5-15) train set of a given split. These subsets are further referred to as in_vivo_10_(I, II, ..., V), in_vivo_20_(I, II, ..., V),..., in_vivo_80_(I, II, ... V).

  2. w

    Global Ai Development Platform Market Research Report: By Deployment...

    • wiseguyreports.com
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Ai Development Platform Market Research Report: By Deployment (Cloud-based, On-premises), By Use Case (Natural Language Processing, Machine Learning, Deep Learning, Computer Vision, Predictive Analytics), By Vertical (Healthcare, Retail, Manufacturing, Banking and Financial Services, Transportation) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/es/reports/ai-development-platform-market
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20236.24(USD Billion)
    MARKET SIZE 20247.56(USD Billion)
    MARKET SIZE 203235.0(USD Billion)
    SEGMENTS COVEREDDeployment ,Use Case ,Vertical ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 Rising Demand for AIpowered Solutions 2 Adoption of Cloudbased AI Platforms 3 Growing Importance of Data Ownership and Security 4 Emergence of LowcodeNocode AI Development Tools 5 Increasing Competition among AI Platform Providers
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTensorflow ,Domino Data Lab ,Keras ,Iguazio ,RapidMiner ,CatBoost ,LightGBM ,PyTorch ,DataRobot ,Scikitlearn ,XGBoost ,H2O.ai ,Algorithmia ,C3 ai ,Determined AI
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIES1 AI Platform as a Service PaaS Adoption 2 Increasing Demand for Conversational AI 3 Rise of LowCodeNoCode AI Development 4 Integration with Cloud and Edge Computing 5 Growing Adoption in Healthcare and Finance
    COMPOUND ANNUAL GROWTH RATE (CAGR) 21.12% (2024 - 2032)
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč (2023). Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg7
Organization logo

Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores

Related Article
Explore at:
bin, csvAvailable download formats
Dataset updated
Sep 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lukas Bucinsky; Marián Gall; Marián Gall; Ján Matúška; Michal Pitoňák; Marek Štekláč; Lukas Bucinsky; Ján Matúška; Michal Pitoňák; Marek Štekláč
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease Mpro (PDB ID: 6WQF).

Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 μM or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only set). Gasteiger charges were reassigned to the remaining compounds using OpenBabel. In addition, four in-vitro-only compounds with docking scores greater than 1 kcal/mol have been rejected.

The provided in-vivo and the in-vitro-only sets contain 59,884 (in-vivo.xyz) and 174,014 (in-vitro-only.xyz) compounds, respectively. Compounds in both sets contain the following elements: H, C, N, O, F, P, S, Cl, Br, and I. The in-vivo compound set was used as the primary data set for the training of the ML models in the referencing study.

The file in-vivo-splits-data.csv contains the exact composition of all (random) 80-5-15 train-validation-test splits used in the study, labeled I, II, III, IV, and V. Eight additional random subsets in each of the in-vivo 80-5-15 splits were created to monitor the training process convergence. These subsets were constructed in such a manner, that each subset contains all compounds from the previous subset (starting with the 10-5-15 subset) and was enlarged by one eighth of the entire (80-5-15) train set of a given split. These subsets are further referred to as in_vivo_10_(I, II, ..., V), in_vivo_20_(I, II, ..., V),..., in_vivo_80_(I, II, ... V).

Search
Clear search
Close search
Google apps
Main menu