100+ datasets found
  1. Machine learning code and best models.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng (2024). Machine learning code and best models. [Dataset]. http://doi.org/10.1371/journal.pone.0300662.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    They are available at https://github.com/nerdyqx/ML. (ZIP)

  2. D

    Notable AI Models

    • epoch.ai
    csv
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Epoch AI (2025). Notable AI Models [Dataset]. https://epoch.ai/data/ai-models
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Epoch AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Global
    Variables measured
    https://epoch.ai/data/ai-models-documentation#records
    Measurement technique
    https://epoch.ai/data/ai-models-documentation#records
    Description

    Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.

  3. Data from: MLOmics: Cancer Multi-Omics Database for Machine Learning

    • figshare.com
    bin
    Updated May 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rikuto Kotoge (2025). MLOmics: Cancer Multi-Omics Database for Machine Learning [Dataset]. http://doi.org/10.6084/m9.figshare.28729127.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    May 25, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rikuto Kotoge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. we propose MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.

  4. D

    In-Database Machine Learning Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). In-Database Machine Learning Market Research Report 2033 [Dataset]. https://dataintelo.com/report/in-database-machine-learning-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    In-Database Machine Learning Market Outlook




    According to our latest research, the global In-Database Machine Learning market size reached USD 2.77 billion in 2024. The market is exhibiting robust momentum, with a compound annual growth rate (CAGR) of 28.4% projected over the forecast period. By 2033, the In-Database Machine Learning market is expected to escalate to USD 21.13 billion globally, driven by increasing enterprise adoption of advanced analytics and artificial intelligence embedded directly within databases. This exponential growth is fueled by the surging demand for real-time data processing, operational efficiency, and the seamless integration of machine learning (ML) models within business-critical applications.




    A significant growth factor in the In-Database Machine Learning market is the rising need for organizations to derive actionable insights from massive volumes of data in real time. Traditional machine learning workflows often require extracting data from databases, leading to latency, security risks, and operational bottlenecks. In-database machine learning addresses these challenges by enabling ML algorithms to operate directly where the data resides, eliminating the need for data movement. This approach not only accelerates the analytics lifecycle but also enhances data security and compliance, which is particularly crucial in regulated industries such as banking, healthcare, and finance. Organizations are increasingly recognizing the strategic value of embedding ML capabilities within their database environments to unlock deeper insights, automate decision-making, and drive competitive advantage.




    Another pivotal driver is the evolution of database technologies and the proliferation of cloud-based database platforms. Modern relational and NoSQL databases are now equipped with native machine learning functionalities, making it easier for enterprises to deploy, train, and operationalize ML models at scale. The shift towards cloud-based and hybrid database infrastructures further amplifies the adoption of in-database ML, as organizations seek scalable and flexible solutions that can handle diverse data types and workloads. Vendors are responding by offering integrated ML toolkits and APIs, lowering the entry barrier for data scientists and business analysts. Furthermore, the convergence of big data, artificial intelligence, and advanced analytics is fostering innovation, enabling organizations to tackle complex use cases such as fraud detection, predictive maintenance, and personalized customer experiences.




    The increasing emphasis on digital transformation across industries is also propelling the growth of the In-Database Machine Learning market. Enterprises are under pressure to modernize their data architectures and leverage AI-driven insights to optimize operations, reduce costs, and enhance customer engagement. In-database ML empowers organizations to streamline their analytics workflows, achieve real-time intelligence, and respond swiftly to market changes. The technology’s ability to scale across large datasets and integrate seamlessly with existing business processes makes it an attractive proposition for both large enterprises and small and medium-sized enterprises (SMEs). As a result, investments in in-database ML solutions are expected to surge, with vendors continuously innovating to deliver enhanced performance, automation, and explainability.




    From a regional perspective, North America currently leads the global In-Database Machine Learning market, accounting for the largest revenue share in 2024. This dominance is attributed to the region’s advanced IT infrastructure, high adoption of cloud technologies, and the strong presence of leading technology vendors. Europe follows closely, driven by stringent data privacy regulations and growing investments in AI-driven analytics across sectors such as BFSI, healthcare, and manufacturing. The Asia Pacific region is emerging as a high-growth market, propelled by rapid digitalization, expanding enterprise data volumes, and government initiatives to foster AI innovation. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit at a slower pace, as organizations in these regions gradually embrace data-driven decision-making and cloud-based analytics platforms.



    Component Analysis




    The In-Database Machine Learning market is segmented by component into Software and S

  5. D

    Large-Scale AI Models

    • epoch.ai
    csv
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Epoch AI (2025). Large-Scale AI Models [Dataset]. https://epoch.ai/data/ai-models
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Epoch AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Global
    Variables measured
    https://epoch.ai/data/ai-models-documentation
    Measurement technique
    https://epoch.ai/data/ai-models-documentation
    Description

    The Large-Scale AI Models database documents over 200 models trained with more than 10²³ floating point operations, at the leading edge of scale and capabilities.

  6. f

    The total features of HF patients.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). The total features of HF patients. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  7. m

    Machine learning for corrosion database

    • data.mendeley.com
    Updated Oct 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Bertolucci Coelho (2021). Machine learning for corrosion database [Dataset]. http://doi.org/10.17632/jfn8yhrphd.1
    Explore at:
    Dataset updated
    Oct 26, 2021
    Authors
    Leonardo Bertolucci Coelho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database was firstly created for the scientific article entitled: "Reviewing Machine Learning of corrosion prediction: a data-oriented perspective"

    L.B. Coelho 1 , D. Zhang 2 , Y.V. Ingelgem 1 , D. Steckelmacher 3 , A. Nowé 3 , H.A. Terryn 1

    1 Department of Materials and Chemistry, Research Group Electrochemical and Surface Engineering, Vrije Universiteit Brussel, Brussels, Belgium 2 A Beijing Advanced Innovation Center for Materials Genome Engineering, National Materials Corrosion and Protection Data Center, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, China 3 VUB Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium

    Different metrics are possible to evaluate the prediction accuracy of regression models. However, only papers providing relative metrics (MAPE, R²) were included in this database. We tried as much as possible to include descriptors of all major ML procedure steps, including data collection (“Data acquisition”), data cleaning feature engineering (“Feature reduction”), model validation (“Train-Test split”*), etc.

    *the total dataset is typically split into training sets and testing (unknown data) sets for performance evaluation of the model. Nonetheless, sometimes only the training or the testing performances were reported (“?” marks were added in the respective evaluation metric field(s)). The “Average R²” was sometimes considered for studies employing “CV” (cross-validation) on the dataset. For a detailed description of the ML basic procedures, the reader could refer to the References topic in the Review article.

  8. f

    Group and number of experiments.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Group and number of experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  9. Performance of the machine learning models.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Nov 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenchi Liu; Xing Yu; Jinhong Chen; Weizhi Chen; Qiaoyi Wu (2024). Performance of the machine learning models. [Dataset]. http://doi.org/10.1371/journal.pone.0313132.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wenchi Liu; Xing Yu; Jinhong Chen; Weizhi Chen; Qiaoyi Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundPeople with traumatic brain injury (TBI) are at high risk for infection and sepsis. The aim of the study was to develop and validate an explainable machine learning(ML) model based on clinical features for early prediction of the risk of sepsis in TBI patients.MethodsWe enrolled all patients with TBI in the Medical Information Mart for Intensive Care IV database from 2008 to 2019. All patients were randomly divided into a training set (70%) and a test set (30%). The univariate and multivariate regression analyses were used for feature selection. Six ML methods were applied to develop the model. The predictive performance of different models were determined based on the area under the curve (AUC) and calibration curves in the test cohort. In addition, we selected the eICU Collaborative Research Database version 1.2 as the external validation dataset. Finally, we used the Shapley additive interpretation to account for the effects of features attributed to the model.ResultsOf the 1555 patients enrolled in the final cohort, 834 (53.6%) patients developed sepsis after TBI. Six variables were associated with concomitant sepsis and were used to develop ML models. Of the 6 models constructed, the Extreme Gradient Boosting (XGB) model achieved the best performance with an AUC of 0.807 and an accuracy of 74.5% in the internal validation cohort, and an AUC of 0.762 for the external validation. Feature importance analysis revealed that use mechanical ventilation, SAPSII score, use intravenous pressors, blood transfusion on admission, history of diabetes, and presence of post-stroke sequelae were the top six most influential features of the XGB model.ConclusionAs shown in the study, the ML model could be used to predict the occurrence of sepsis in patients with TBI in the intensive care unit.

  10. ΔG-RDKit: Solvation Free Energy Database

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Ferraz-Caetano; José Ferraz-Caetano; Filipe Teixeira; Filipe Teixeira; M. Natália D. S. Cordeiro; M. Natália D. S. Cordeiro (2023). ΔG-RDKit: Solvation Free Energy Database [Dataset]. http://doi.org/10.5281/zenodo.8121619
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Ferraz-Caetano; José Ferraz-Caetano; Filipe Teixeira; Filipe Teixeira; M. Natália D. S. Cordeiro; M. Natália D. S. Cordeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present the full database of the article "Explainable Supervised Machine Learning Model to Predict Solvation Free Energy".

    This is the database used for a ML model, containing a variety of solvent-solute pairs with known experimental solvation free energy ΔGsolv values. Data entries were collected from two separate databases. The FreeSolv library, with 642 experimental aqueous ΔGsolv determinations and the Solv@TUM database with 5597 entries for non-aqueous solvents. Both databases were selected given their wide-scale of solute/solvents pairs, amassing 6239 experimental values across light and heavy-atom solutes with a diverse solvent structure and with small value uncertainties.

    Experimental ΔGsolv values range from -14 to 4 kcal mol-1 and each solute/solvent pair is represented by their chemical family, SMILES string and InChlKey. We generated 213 chemical descriptors for every solvent and solute in each entry using RDKit software, version 2022.09.4, running on top of Python 3.9. Descriptors were calculated from the “MolFromSmiles” function in “RDKIT.Chem” as descriptors with non-numerical values were removed. The descriptors encode significant chemical information and are used to present physicochemical characteristics of compounds, building a relationship between structure and ΔGsolv.

    Through Machine Learning regression algorithms, our models were able to make ΔGsolv predictions with high accuracy, based on the information encoded in each chemical feature.

  11. V

    Vector Database Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Vector Database Software Report [Dataset]. https://www.datainsightsmarket.com/reports/vector-database-software-529421
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Vector Database Software market is poised for substantial growth, projected to reach an estimated $XXX million in 2025, with an impressive Compound Annual Growth Rate (CAGR) of XX% during the forecast period of 2025-2033. This rapid expansion is fueled by the increasing adoption of AI and machine learning across industries, necessitating efficient storage and retrieval of unstructured data like images, audio, and text. The burgeoning demand for enhanced search capabilities, personalized recommendations, and advanced anomaly detection is driving the market forward. Key market drivers include the widespread implementation of large language models (LLMs), the growing need for semantic search functionalities, and the continuous innovation in AI-powered applications. The market is segmenting into applications catering to both Small and Medium-sized Enterprises (SMEs) and Large Enterprises, with a clear shift towards Cloud-based solutions owing to their scalability, cost-effectiveness, and ease of deployment. The vector database landscape is characterized by dynamic innovation and fierce competition, with prominent players like Pinecone, Weaviate, Supabase, and Zilliz Cloud leading the charge. Emerging trends such as the development of hybrid search capabilities, integration with existing data infrastructure, and enhanced security features are shaping the market's trajectory. While the market shows immense promise, certain restraints, including the complexity of data integration and the need for specialized technical expertise, may pose challenges. Geographically, North America is expected to dominate the market share due to its early adoption of AI technologies and robust R&D investments, followed closely by Asia Pacific, which is witnessing rapid digital transformation and a surge in AI startups. Europe and other emerging regions are also anticipated to contribute significantly to market growth as AI adoption becomes more widespread. This report delves into the rapidly evolving Vector Database Software Market, providing a detailed analysis of its landscape from 2019 to 2033. With a Base Year of 2025, the report offers crucial insights for the Estimated Year of 2025 and projects market dynamics through the Forecast Period of 2025-2033, building upon the Historical Period of 2019-2024. The global vector database software market is poised for significant expansion, with an estimated market size projected to reach hundreds of millions of dollars by 2025, and anticipated to grow exponentially in the coming years. This growth is fueled by the increasing adoption of AI and machine learning across various industries, necessitating efficient storage and retrieval of high-dimensional vector data.

  12. Raw data used to build models in SIMON Automated Machine Learning

    • zenodo.org
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adriana Tomic; Adriana Tomic (2020). Raw data used to build models in SIMON Automated Machine Learning [Dataset]. http://doi.org/10.5281/zenodo.1324553
    Explore at:
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adriana Tomic; Adriana Tomic
    Description

    Here you can find all data and all information regarding each generated dataset.
    For each dataset there are 4 files:

    json_info : This file contains, number of features with their names and number of subjects that are available for the same dataset
    data_testing: data frame with data used to test trained model
    data_training: data frame with data used to train models
    results: direct unfiltered data from database


    Files are written in feather format.

    Here is an example of data structure for each file in repository

  13. m

    Ultimate_Analysis

    • data.mendeley.com
    Updated Jan 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
    Explore at:
    Dataset updated
    Jan 28, 2022
    Authors
    Akara Kijkarncharoensin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

    The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

    An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

    A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

    The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

    Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.

  14. D

    Data Modeling Tool Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Modeling Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/data-modeling-tool-542143
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data modeling tool market is experiencing robust growth, driven by the increasing demand for efficient data management and the rise of big data analytics. The market, estimated at $5 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. This expansion is fueled by several key factors, including the growing adoption of cloud-based data modeling solutions, the increasing need for data governance and compliance, and the expanding use of data visualization and business intelligence tools that rely on well-structured data models. The market is segmented by tool type (e.g., ER diagramming tools, UML modeling tools), deployment mode (cloud, on-premise), and industry vertical (e.g., BFSI, healthcare, retail). Competition is intense, with established players like IBM, Oracle, and SAP vying for market share alongside numerous specialized vendors offering niche solutions. The market's growth is being further accelerated by the adoption of agile methodologies and DevOps practices that necessitate faster and more iterative data modeling processes. The major restraints impacting market growth include the high cost of advanced data modeling software, the complexity associated with implementing and maintaining these solutions, and the lack of skilled professionals adept at data modeling techniques. The increasing availability of open-source tools, coupled with the growth of professional training programs focused on data modeling, are gradually alleviating this constraint. Future growth will likely be shaped by innovations in artificial intelligence (AI) and machine learning (ML) that are being integrated into data modeling tools to automate aspects of model creation and validation. The trend towards data mesh architecture and the growing importance of data literacy are also driving demand for user-friendly and accessible data modeling tools. Furthermore, the development of integrated platforms that combine data modeling with other data management functions is a key market trend that is likely to significantly impact future growth.

  15. f

    Feature correspondence q.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Feature correspondence q. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  16. Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...

    • zenodo.org
    zip
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh (2023). A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7091192
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of MOFs constructed from building blocks of stable MOFs.

  17. Z

    DocTOR models and cross-validation dataset

    • data.niaid.nih.gov
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Galletti Cristiano (2022). DocTOR models and cross-validation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6337103
    Explore at:
    Dataset updated
    Mar 23, 2022
    Dataset authored and provided by
    Galletti Cristiano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset necessary for DocTOR utility.

    DocTOR (Direct fOreCast Target On Reaction), is a utility written in python3.9 (using the conda workframe) that allows the user to upload a list of Uniprot IDs and Adverse reactions (from the available models) in order to study the relationship between the two.

    On output the program will assign a positive or negative class to the protein, assessing its possible involvement in the selected ADRs onset.

    DocTOR exploits the data coming from T-ARDIS [https://doi.org/10.1093/database/baab068] to train different Machine Learning approaches (SVM, RF, NN) using network topological measurements as features.

    The prediction coming from the single trained models are combined in a meta-predictor exploiting three different voting systems.

    The results of the meta-predictor together with the ones from the single ML method will be available in the output log file (named "predictions_community" or "predictions_curated" based on the database type).

    The DocTOR utility is avaiable at https://github.com/cristian931/DocTOR

  18. Z

    Improving machine-learning models in materials science through large...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cerqueira, Tiago F.T. (2024). Improving machine-learning models in materials science through large datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12582649
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Jonathan, Schmidt
    Botti, Silvana
    Jaeger, Fabian
    Wang, Hai-Chen
    Marques, Miguel A.L.
    Loew, Antoine
    Cerqueira, Tiago F.T.
    Romero, Aldo H.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Image of the Alexandria database state corresponding to the paper "Improving machine-learning models in materials science through large datasets".

    Static pbe calculations for 1D, 2D, 3D compounds can be found in 1D_pbe.tar.gz, 2D_pbe.tar.gz, 3D_pbe.tar.gz in batches of 100k materials. The latter also contains a separate convex hull pickle with all compounds on the pbe convex hull (convex_hull_pbe_2023.12.29.json.bz2) and a list of prototypes in the database (prototypes.json.bz2). The systematic 3D calculations performed for the article Improving machine-learning models in materials science through large datasets (in the paper referred to as round 2 and 3) can be found by the location keyword in the data dictionary of each ComputedStructureEntry containing "cgat_comp/quaternaries" (round 2) and "cgat_comp2/" (round 3). Round 1 (10.1002/adma.202210788) can be found under "cgat_comp/ternaries", ""cgat_comp/binaries".

    Static pbesol calculations for 3D compounds can be found in 3D_ps.tar (still zip compressed) in batches of 100k materials. The folder also contains a separate convex hull pickle with all compounds on the pbesol convex hull (convex_hull_ps_2023.12.29.json.bz2).

    Static scan calculations for 3D compounds can be found in 3D_scan.tar (still zip compressed) in batches of 100k materials. The folder also contains a separate convex hull pickle with all compounds on the scan convex hull (convex_hull_scan_2023.12.29.json.bz2).

    Geometry relaxation curves for 1D and 2D and 3D compounds calculated with PBE can be found in geo_opt_1D.tar.gz, geo_opt_2D.tar.gz. and geo_opt_3D.tar. Each file in each folder contains a batch of up to 10k relaxation trajectories.

    PBESOL relaxation trajectories for 3D compounds can be found in geo_opt_ps.tar

    1. Crystal graph attention networks to predict the volume (volume_round_3.tar.gz) and distance to the convex hull (e_above_hull_round_3.tar.gz) trained for the paper "Improving machine-learning models in materials science through large datasets".

    Can be used with the code at https://github.com/hyllios/CGAT/tree/main/CGAT.Note will predict the distance to the convex hull not normalized per atom when using the code on the github.

    1. Alignn models as well as m3gnet and mace models corresponding to the publication can be found in alexandria_v2.tar.gz

    2. scripts.tar.gz Some scripts used for generating CGAT input data/ performing parallel predictions and for relaxations with m3gnet/mace force fields

  19. MNIST-Handwritten Digit Recognition Problem

    • kaggle.com
    Updated Dec 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dillsunnyb11 (2017). MNIST-Handwritten Digit Recognition Problem [Dataset]. https://www.kaggle.com/dillsunnyb11/digit-recognizer/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dillsunnyb11
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by dillsunnyb11

    Released under Database: Open Database, Contents: Database Contents

    Contents

  20. Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...

    • zenodo.org
    txt, zip
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh (2024). A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7666081
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of MOFs constructed from building blocks of stable MOFs.

    Note: the columns labeled "rho" in features_and_properties are actually cell volume and not density.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng (2024). Machine learning code and best models. [Dataset]. http://doi.org/10.1371/journal.pone.0300662.s002
Organization logo

Machine learning code and best models.

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Apr 17, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

They are available at https://github.com/nerdyqx/ML. (ZIP)

Search
Clear search
Close search
Google apps
Main menu