100+ datasets found

Machine learning code and best models.
plos.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng (2024). Machine learning code and best models. [Dataset]. http://doi.org/10.1371/journal.pone.0300662.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0300662.s002
Dataset updated
Apr 17, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
They are available at https://github.com/nerdyqx/ML. (ZIP)
D
Notable AI Models
epoch.ai
csv
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Epoch AI (2025). Notable AI Models [Dataset]. https://epoch.ai/data/ai-models
Explore at:
csvAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Epoch AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Global
Variables measured
https://epoch.ai/data/ai-models-documentation#records
Measurement technique
https://epoch.ai/data/ai-models-documentation#records
Description
Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.
Data from: MLOmics: Cancer Multi-Omics Database for Machine Learning
figshare.com
bin
Updated May 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rikuto Kotoge (2025). MLOmics: Cancer Multi-Omics Database for Machine Learning [Dataset]. http://doi.org/10.6084/m9.figshare.28729127.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28729127.v2
Dataset updated
May 25, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rikuto Kotoge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. we propose MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.
D
In-Database Machine Learning Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). In-Database Machine Learning Market Research Report 2033 [Dataset]. https://dataintelo.com/report/in-database-machine-learning-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
In-Database Machine Learning Market Outlook

According to our latest research, the global In-Database Machine Learning market size reached USD 2.77 billion in 2024. The market is exhibiting robust momentum, with a compound annual growth rate (CAGR) of 28.4% projected over the forecast period. By 2033, the In-Database Machine Learning market is expected to escalate to USD 21.13 billion globally, driven by increasing enterprise adoption of advanced analytics and artificial intelligence embedded directly within databases. This exponential growth is fueled by the surging demand for real-time data processing, operational efficiency, and the seamless integration of machine learning (ML) models within business-critical applications.

A significant growth factor in the In-Database Machine Learning market is the rising need for organizations to derive actionable insights from massive volumes of data in real time. Traditional machine learning workflows often require extracting data from databases, leading to latency, security risks, and operational bottlenecks. In-database machine learning addresses these challenges by enabling ML algorithms to operate directly where the data resides, eliminating the need for data movement. This approach not only accelerates the analytics lifecycle but also enhances data security and compliance, which is particularly crucial in regulated industries such as banking, healthcare, and finance. Organizations are increasingly recognizing the strategic value of embedding ML capabilities within their database environments to unlock deeper insights, automate decision-making, and drive competitive advantage.

Another pivotal driver is the evolution of database technologies and the proliferation of cloud-based database platforms. Modern relational and NoSQL databases are now equipped with native machine learning functionalities, making it easier for enterprises to deploy, train, and operationalize ML models at scale. The shift towards cloud-based and hybrid database infrastructures further amplifies the adoption of in-database ML, as organizations seek scalable and flexible solutions that can handle diverse data types and workloads. Vendors are responding by offering integrated ML toolkits and APIs, lowering the entry barrier for data scientists and business analysts. Furthermore, the convergence of big data, artificial intelligence, and advanced analytics is fostering innovation, enabling organizations to tackle complex use cases such as fraud detection, predictive maintenance, and personalized customer experiences.

The increasing emphasis on digital transformation across industries is also propelling the growth of the In-Database Machine Learning market. Enterprises are under pressure to modernize their data architectures and leverage AI-driven insights to optimize operations, reduce costs, and enhance customer engagement. In-database ML empowers organizations to streamline their analytics workflows, achieve real-time intelligence, and respond swiftly to market changes. The technology’s ability to scale across large datasets and integrate seamlessly with existing business processes makes it an attractive proposition for both large enterprises and small and medium-sized enterprises (SMEs). As a result, investments in in-database ML solutions are expected to surge, with vendors continuously innovating to deliver enhanced performance, automation, and explainability.

From a regional perspective, North America currently leads the global In-Database Machine Learning market, accounting for the largest revenue share in 2024. This dominance is attributed to the region’s advanced IT infrastructure, high adoption of cloud technologies, and the strong presence of leading technology vendors. Europe follows closely, driven by stringent data privacy regulations and growing investments in AI-driven analytics across sectors such as BFSI, healthcare, and manufacturing. The Asia Pacific region is emerging as a high-growth market, propelled by rapid digitalization, expanding enterprise data volumes, and government initiatives to foster AI innovation. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit at a slower pace, as organizations in these regions gradually embrace data-driven decision-making and cloud-based analytics platforms.

Component Analysis

The In-Database Machine Learning market is segmented by component into Software and S
D
Large-Scale AI Models
epoch.ai
csv
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Epoch AI (2025). Large-Scale AI Models [Dataset]. https://epoch.ai/data/ai-models
Explore at:
csvAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Epoch AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Global
Variables measured
https://epoch.ai/data/ai-models-documentation
Measurement technique
https://epoch.ai/data/ai-models-documentation
Description
The Large-Scale AI Models database documents over 200 models trained with more than 10²³ floating point operations, at the leading edge of scale and capabilities.
f
The total features of HF patients.
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). The total features of HF patients. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276835.t003
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
m
Machine learning for corrosion database
data.mendeley.com
Updated Oct 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Bertolucci Coelho (2021). Machine learning for corrosion database [Dataset]. http://doi.org/10.17632/jfn8yhrphd.1
Explore at:
Unique identifier
https://doi.org/10.17632/jfn8yhrphd.1
Dataset updated
Oct 26, 2021
Authors
Leonardo Bertolucci Coelho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database was firstly created for the scientific article entitled: "Reviewing Machine Learning of corrosion prediction: a data-oriented perspective"

L.B. Coelho 1 , D. Zhang 2 , Y.V. Ingelgem 1 , D. Steckelmacher 3 , A. Nowé 3 , H.A. Terryn 1

1 Department of Materials and Chemistry, Research Group Electrochemical and Surface Engineering, Vrije Universiteit Brussel, Brussels, Belgium 2 A Beijing Advanced Innovation Center for Materials Genome Engineering, National Materials Corrosion and Protection Data Center, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, China 3 VUB Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium

Different metrics are possible to evaluate the prediction accuracy of regression models. However, only papers providing relative metrics (MAPE, R²) were included in this database. We tried as much as possible to include descriptors of all major ML procedure steps, including data collection (“Data acquisition”), data cleaning feature engineering (“Feature reduction”), model validation (“Train-Test split”*), etc.

*the total dataset is typically split into training sets and testing (unknown data) sets for performance evaluation of the model. Nonetheless, sometimes only the training or the testing performances were reported (“?” marks were added in the respective evaluation metric field(s)). The “Average R²” was sometimes considered for studies employing “CV” (cross-validation) on the dataset. For a detailed description of the ML basic procedures, the reader could refer to the References topic in the Review article.
f
Group and number of experiments.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Group and number of experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276835.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Performance of the machine learning models.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenchi Liu; Xing Yu; Jinhong Chen; Weizhi Chen; Qiaoyi Wu (2024). Performance of the machine learning models. [Dataset]. http://doi.org/10.1371/journal.pone.0313132.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313132.t003
Dataset updated
Nov 11, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Wenchi Liu; Xing Yu; Jinhong Chen; Weizhi Chen; Qiaoyi Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundPeople with traumatic brain injury (TBI) are at high risk for infection and sepsis. The aim of the study was to develop and validate an explainable machine learning(ML) model based on clinical features for early prediction of the risk of sepsis in TBI patients.MethodsWe enrolled all patients with TBI in the Medical Information Mart for Intensive Care IV database from 2008 to 2019. All patients were randomly divided into a training set (70%) and a test set (30%). The univariate and multivariate regression analyses were used for feature selection. Six ML methods were applied to develop the model. The predictive performance of different models were determined based on the area under the curve (AUC) and calibration curves in the test cohort. In addition, we selected the eICU Collaborative Research Database version 1.2 as the external validation dataset. Finally, we used the Shapley additive interpretation to account for the effects of features attributed to the model.ResultsOf the 1555 patients enrolled in the final cohort, 834 (53.6%) patients developed sepsis after TBI. Six variables were associated with concomitant sepsis and were used to develop ML models. Of the 6 models constructed, the Extreme Gradient Boosting (XGB) model achieved the best performance with an AUC of 0.807 and an accuracy of 74.5% in the internal validation cohort, and an AUC of 0.762 for the external validation. Feature importance analysis revealed that use mechanical ventilation, SAPSII score, use intravenous pressors, blood transfusion on admission, history of diabetes, and presence of post-stroke sequelae were the top six most influential features of the XGB model.ConclusionAs shown in the study, the ML model could be used to predict the occurrence of sepsis in patients with TBI in the intensive care unit.
ΔG-RDKit: Solvation Free Energy Database
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José Ferraz-Caetano; José Ferraz-Caetano; Filipe Teixeira; Filipe Teixeira; M. Natália D. S. Cordeiro; M. Natália D. S. Cordeiro (2023). ΔG-RDKit: Solvation Free Energy Database [Dataset]. http://doi.org/10.5281/zenodo.8121619
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8121619
Dataset updated
Jul 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
José Ferraz-Caetano; José Ferraz-Caetano; Filipe Teixeira; Filipe Teixeira; M. Natália D. S. Cordeiro; M. Natália D. S. Cordeiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present the full database of the article "Explainable Supervised Machine Learning Model to Predict Solvation Free Energy".

This is the database used for a ML model, containing a variety of solvent-solute pairs with known experimental solvation free energy ΔG_solv values. Data entries were collected from two separate databases. The FreeSolv library, with 642 experimental aqueous ΔG_solvdeterminations and the Solv@TUM database with 5597 entries for non-aqueous solvents. Both databases were selected given their wide-scale of solute/solvents pairs, amassing 6239 experimental values across light and heavy-atom solutes with a diverse solvent structure and with small value uncertainties.

Experimental ΔG_solv values range from -14 to 4 kcal mol^-1 and each solute/solvent pair is represented by their chemical family, SMILES string and InChlKey. We generated 213 chemical descriptors for every solvent and solute in each entry using RDKit software, version 2022.09.4, running on top of Python 3.9. Descriptors were calculated from the “MolFromSmiles” function in “RDKIT.Chem” as descriptors with non-numerical values were removed. The descriptors encode significant chemical information and are used to present physicochemical characteristics of compounds, building a relationship between structure and ΔG_solv.

Through Machine Learning regression algorithms, our models were able to make ΔG_solv predictions with high accuracy, based on the information encoded in each chemical feature.
V
Vector Database Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Vector Database Software Report [Dataset]. https://www.datainsightsmarket.com/reports/vector-database-software-529421
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Sep 20, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Vector Database Software market is poised for substantial growth, projected to reach an estimated $XXX million in 2025, with an impressive Compound Annual Growth Rate (CAGR) of XX% during the forecast period of 2025-2033. This rapid expansion is fueled by the increasing adoption of AI and machine learning across industries, necessitating efficient storage and retrieval of unstructured data like images, audio, and text. The burgeoning demand for enhanced search capabilities, personalized recommendations, and advanced anomaly detection is driving the market forward. Key market drivers include the widespread implementation of large language models (LLMs), the growing need for semantic search functionalities, and the continuous innovation in AI-powered applications. The market is segmenting into applications catering to both Small and Medium-sized Enterprises (SMEs) and Large Enterprises, with a clear shift towards Cloud-based solutions owing to their scalability, cost-effectiveness, and ease of deployment. The vector database landscape is characterized by dynamic innovation and fierce competition, with prominent players like Pinecone, Weaviate, Supabase, and Zilliz Cloud leading the charge. Emerging trends such as the development of hybrid search capabilities, integration with existing data infrastructure, and enhanced security features are shaping the market's trajectory. While the market shows immense promise, certain restraints, including the complexity of data integration and the need for specialized technical expertise, may pose challenges. Geographically, North America is expected to dominate the market share due to its early adoption of AI technologies and robust R&D investments, followed closely by Asia Pacific, which is witnessing rapid digital transformation and a surge in AI startups. Europe and other emerging regions are also anticipated to contribute significantly to market growth as AI adoption becomes more widespread. This report delves into the rapidly evolving Vector Database Software Market, providing a detailed analysis of its landscape from 2019 to 2033. With a Base Year of 2025, the report offers crucial insights for the Estimated Year of 2025 and projects market dynamics through the Forecast Period of 2025-2033, building upon the Historical Period of 2019-2024. The global vector database software market is poised for significant expansion, with an estimated market size projected to reach hundreds of millions of dollars by 2025, and anticipated to grow exponentially in the coming years. This growth is fueled by the increasing adoption of AI and machine learning across various industries, necessitating efficient storage and retrieval of high-dimensional vector data.
Raw data used to build models in SIMON Automated Machine Learning
zenodo.org
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adriana Tomic; Adriana Tomic (2020). Raw data used to build models in SIMON Automated Machine Learning [Dataset]. http://doi.org/10.5281/zenodo.1324553
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1324553
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adriana Tomic; Adriana Tomic
Description
Here you can find all data and all information regarding each generated dataset.
For each dataset there are 4 files:

json_info : This file contains, number of features with their names and number of subjects that are available for the same dataset
data_testing: data frame with data used to test trained model
data_training: data frame with data used to train models
results: direct unfiltered data from database

Files are written in feather format.

Here is an example of data structure for each file in repository
m
Ultimate_Analysis
data.mendeley.com
Updated Jan 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
Explore at:
Unique identifier
https://doi.org/10.17632/t8x96g88p3.2
Dataset updated
Jan 28, 2022
Authors
Akara Kijkarncharoensin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
D
Data Modeling Tool Report
marketresearchforecast.com
doc, pdf, ppt
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Modeling Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/data-modeling-tool-542143
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 30, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data modeling tool market is experiencing robust growth, driven by the increasing demand for efficient data management and the rise of big data analytics. The market, estimated at $5 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. This expansion is fueled by several key factors, including the growing adoption of cloud-based data modeling solutions, the increasing need for data governance and compliance, and the expanding use of data visualization and business intelligence tools that rely on well-structured data models. The market is segmented by tool type (e.g., ER diagramming tools, UML modeling tools), deployment mode (cloud, on-premise), and industry vertical (e.g., BFSI, healthcare, retail). Competition is intense, with established players like IBM, Oracle, and SAP vying for market share alongside numerous specialized vendors offering niche solutions. The market's growth is being further accelerated by the adoption of agile methodologies and DevOps practices that necessitate faster and more iterative data modeling processes. The major restraints impacting market growth include the high cost of advanced data modeling software, the complexity associated with implementing and maintaining these solutions, and the lack of skilled professionals adept at data modeling techniques. The increasing availability of open-source tools, coupled with the growth of professional training programs focused on data modeling, are gradually alleviating this constraint. Future growth will likely be shaped by innovations in artificial intelligence (AI) and machine learning (ML) that are being integrated into data modeling tools to automate aspects of model creation and validation. The trend towards data mesh architecture and the growing importance of data literacy are also driving demand for user-friendly and accessible data modeling tools. Furthermore, the development of integrated platforms that combine data modeling with other data management functions is a key market trend that is likely to significantly impact future growth.
f
Feature correspondence q.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Feature correspondence q. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276835.t010
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...
zenodo.org
zip
Updated Feb 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh (2023). A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7091192
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7091192
Dataset updated
Feb 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of MOFs constructed from building blocks of stable MOFs.
Z
DocTOR models and cross-validation dataset
data.niaid.nih.gov
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Galletti Cristiano (2022). DocTOR models and cross-validation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6337103
Explore at:
Dataset updated
Mar 23, 2022
Dataset authored and provided by
Galletti Cristiano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset necessary for DocTOR utility.

DocTOR (Direct fOreCast Target On Reaction), is a utility written in python3.9 (using the conda workframe) that allows the user to upload a list of Uniprot IDs and Adverse reactions (from the available models) in order to study the relationship between the two.

On output the program will assign a positive or negative class to the protein, assessing its possible involvement in the selected ADRs onset.

DocTOR exploits the data coming from T-ARDIS [https://doi.org/10.1093/database/baab068] to train different Machine Learning approaches (SVM, RF, NN) using network topological measurements as features.

The prediction coming from the single trained models are combined in a meta-predictor exploiting three different voting systems.

The results of the meta-predictor together with the ones from the single ML method will be available in the output log file (named "predictions_community" or "predictions_curated" based on the database type).

The DocTOR utility is avaiable at https://github.com/cristian931/DocTOR
Z
Improving machine-learning models in materials science through large...
data.niaid.nih.gov
zenodo.org
Updated Oct 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cerqueira, Tiago F.T. (2024). Improving machine-learning models in materials science through large datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12582649
Explore at:
Dataset updated
Oct 23, 2024
Dataset provided by
Jonathan, Schmidt
Botti, Silvana
Jaeger, Fabian
Wang, Hai-Chen
Marques, Miguel A.L.
Loew, Antoine
Cerqueira, Tiago F.T.
Romero, Aldo H.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Image of the Alexandria database state corresponding to the paper "Improving machine-learning models in materials science through large datasets".

Static pbe calculations for 1D, 2D, 3D compounds can be found in 1D_pbe.tar.gz, 2D_pbe.tar.gz, 3D_pbe.tar.gz in batches of 100k materials. The latter also contains a separate convex hull pickle with all compounds on the pbe convex hull (convex_hull_pbe_2023.12.29.json.bz2) and a list of prototypes in the database (prototypes.json.bz2). The systematic 3D calculations performed for the article Improving machine-learning models in materials science through large datasets (in the paper referred to as round 2 and 3) can be found by the location keyword in the data dictionary of each ComputedStructureEntry containing "cgat_comp/quaternaries" (round 2) and "cgat_comp2/" (round 3). Round 1 (10.1002/adma.202210788) can be found under "cgat_comp/ternaries", ""cgat_comp/binaries".

Static pbesol calculations for 3D compounds can be found in 3D_ps.tar (still zip compressed) in batches of 100k materials. The folder also contains a separate convex hull pickle with all compounds on the pbesol convex hull (convex_hull_ps_2023.12.29.json.bz2).

Static scan calculations for 3D compounds can be found in 3D_scan.tar (still zip compressed) in batches of 100k materials. The folder also contains a separate convex hull pickle with all compounds on the scan convex hull (convex_hull_scan_2023.12.29.json.bz2).

Geometry relaxation curves for 1D and 2D and 3D compounds calculated with PBE can be found in geo_opt_1D.tar.gz, geo_opt_2D.tar.gz. and geo_opt_3D.tar. Each file in each folder contains a batch of up to 10k relaxation trajectories.

PBESOL relaxation trajectories for 3D compounds can be found in geo_opt_ps.tar

Crystal graph attention networks to predict the volume (volume_round_3.tar.gz) and distance to the convex hull (e_above_hull_round_3.tar.gz) trained for the paper "Improving machine-learning models in materials science through large datasets".

Can be used with the code at https://github.com/hyllios/CGAT/tree/main/CGAT.Note will predict the distance to the convex hull not normalized per atom when using the code on the github.

Alignn models as well as m3gnet and mace models corresponding to the publication can be found in alexandria_v2.tar.gz

scripts.tar.gz Some scripts used for generating CGAT input data/ performing parallel predictions and for relaxations with m3gnet/mace force fields
MNIST-Handwritten Digit Recognition Problem
kaggle.com
Updated Dec 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dillsunnyb11 (2017). MNIST-Handwritten Digit Recognition Problem [Dataset]. https://www.kaggle.com/dillsunnyb11/digit-recognizer/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dillsunnyb11
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by dillsunnyb11

Released under Database: Open Database, Contents: Database Contents

Contents
Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...
zenodo.org
txt, zip
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh (2024). A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models [Dataset]. http://doi.org/10.5281/zenodo.7666081
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7666081
Dataset updated
Apr 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Aditya Nandy; Aditya Nandy; Shuwen Yue; Changhwan Oh; Chenru Duan; Chenru Duan; Gianmarco Terrones; Gianmarco Terrones; Yongchul G. Chung; Yongchul G. Chung; Heather J. Kulik; Heather J. Kulik; Shuwen Yue; Changhwan Oh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of MOFs constructed from building blocks of stable MOFs.

Note: the columns labeled "rho" in features_and_properties are actually cell volume and not density.

Facebook

Twitter

Click to copy link

Link copied

Cite

Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng (2024). Machine learning code and best models. [Dataset]. http://doi.org/10.1371/journal.pone.0300662.s002

Machine learning code and best models.

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0300662.s002

Dataset updated

Apr 17, 2024

Dataset provided by

PLOShttp://plos.org/

Authors

Qingxin Yang; Li Luo; Zhangpeng Lin; Wei Wen; Wenbo Zeng; Hong Deng

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

They are available at https://github.com/nerdyqx/ML. (ZIP)

Clear search

Close search

Google apps

Main menu

Machine learning code and best models.

Notable AI Models

Data from: MLOmics: Cancer Multi-Omics Database for Machine Learning

In-Database Machine Learning Market Research Report 2033

In-Database Machine Learning Market Outlook

Component Analysis

Large-Scale AI Models

The total features of HF patients.

Machine learning for corrosion database

Group and number of experiments.

Performance of the machine learning models.

ΔG-RDKit: Solvation Free Energy Database

Vector Database Software Report

Raw data used to build models in SIMON Automated Machine Learning

Ultimate_Analysis

Data Modeling Tool Report

Feature correspondence q.

Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...

DocTOR models and cross-validation dataset

Improving machine-learning models in materials science through large...

MNIST-Handwritten Digit Recognition Problem

Dataset

Contents

Data from: A Database of Ultrastable MOFs Reassembled from Stable Fragments...

Machine learning code and best models.