68 datasets found

G
Knowledge Discovery in Databases Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Knowledge Discovery in Databases Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/knowledge-discovery-in-databases-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Aug 22, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Knowledge Discovery in Databases Market Outlook

According to our latest research, the global Knowledge Discovery in Databases (KDD) market size reached USD 8.7 billion in 2024, driven by the exponential growth of data across industries and increasing demand for advanced analytics solutions. The market is experiencing a robust expansion, registering a CAGR of 18.5% during the forecast period. By 2033, the Knowledge Discovery in Databases market is projected to attain a value of USD 44.9 billion. This remarkable growth is primarily attributed to the rising adoption of artificial intelligence (AI), machine learning (ML), and big data analytics, which are transforming how organizations extract actionable insights from vast and complex datasets.

The surge in data generation from digital transformation initiatives, IoT devices, and cloud-based applications is a major growth driver for the Knowledge Discovery in Databases market. As organizations increasingly digitize their operations and customer interactions, the volume, variety, and velocity of data have soared, making traditional data analysis methods insufficient. KDD platforms and solutions are essential for uncovering hidden patterns, correlations, and trends within large datasets, enabling businesses to make data-driven decisions and gain a competitive edge. Furthermore, the proliferation of unstructured data from sources such as social media, emails, and multimedia content has heightened the need for advanced mining techniques, further fueling market growth.

Another significant factor propelling the Knowledge Discovery in Databases market is the integration of AI and ML technologies into KDD solutions. These intelligent algorithms enhance the automation, accuracy, and scalability of data mining processes, allowing organizations to extract deeper insights in real time. The increasing availability of cloud-based KDD solutions has democratized access to advanced analytics, enabling small and medium enterprises (SMEs) to leverage sophisticated tools without the need for extensive infrastructure investments. Additionally, the growing emphasis on regulatory compliance, risk management, and fraud detection in sectors such as BFSI and healthcare is driving the adoption of KDD technologies to ensure data integrity and security.

The evolving landscape of digital businesses and the rising importance of customer-centric strategies have also contributed to the expansion of the Knowledge Discovery in Databases market. Enterprises across retail, telecommunications, and manufacturing are harnessing KDD tools to personalize offerings, optimize supply chains, and enhance operational efficiency. The ability of KDD platforms to handle diverse data types, including text, images, and video, has broadened their applicability across various domains. Moreover, the increasing focus on predictive analytics and real-time decision-making is encouraging organizations to invest in KDD solutions that provide timely and actionable insights, thereby driving sustained market growth through 2033.

From a regional perspective, North America continues to dominate the Knowledge Discovery in Databases market, supported by the presence of leading technology vendors, high digital adoption rates, and substantial investments in AI and analytics infrastructure. However, the Asia Pacific region is witnessing the fastest growth, propelled by rapid digitalization, expanding IT ecosystems, and government initiatives promoting data-driven innovation. Europe remains a significant market, characterized by strong regulatory frameworks and a focus on data privacy and security. Latin America and the Middle East & Africa are also emerging as promising markets, driven by increasing awareness of the benefits of KDD and growing investments in digital transformation across industries.

Component Analysis

The Knowledge Discovery in Databases market is segmented by component into Software, Services, and Platforms, each playing a crucial role in the overall ecosystem. Software solutions form the backbone of the KDD ma
Data from: Results obtained in a data mining process applied to a database...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011798.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
D
Knowledge Discovery In Databases Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Knowledge Discovery In Databases Market Research Report 2033 [Dataset]. https://dataintelo.com/report/knowledge-discovery-in-databases-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Knowledge Discovery in Databases (KDD) Market Outlook

According to our latest research, the global Knowledge Discovery in Databases (KDD) market size reached USD 9.6 billion in 2024, propelled by the growing demand for advanced data analytics and intelligent decision-making across industries. The market is expanding at a robust CAGR of 18.7% and is forecasted to reach USD 53.2 billion by 2033. This remarkable growth is driven primarily by the exponential rise in data generation, the adoption of artificial intelligence and machine learning, and the increasing need for actionable insights in real-time environments. As per our latest research, organizations worldwide are leveraging KDD solutions to extract valuable information from massive datasets, thereby fostering innovation, operational efficiency, and competitive advantage.

A significant growth factor for the Knowledge Discovery in Databases market is the rapid digital transformation witnessed across various sectors. Enterprises are increasingly migrating their core operations to digital platforms, resulting in the accumulation of vast amounts of structured and unstructured data. This surge in data volume necessitates advanced analytics tools capable of sifting through complex datasets to uncover hidden patterns, correlations, and anomalies. KDD solutions, encompassing data mining, machine learning algorithms, and visualization tools, are being widely deployed to convert raw data into strategic assets. Furthermore, the integration of KDD with emerging technologies such as big data analytics, Internet of Things (IoT), and cloud computing is further amplifying its adoption, enabling organizations to harness data-driven insights for enhanced decision-making and innovation.

Another major driver fueling the growth of the KDD market is the increasing emphasis on fraud detection, risk management, and regulatory compliance, particularly in sectors like BFSI, healthcare, and government. The proliferation of cyber threats, financial crimes, and regulatory mandates has compelled organizations to invest in sophisticated KDD platforms that can proactively identify suspicious activities and ensure compliance with evolving standards. These solutions leverage advanced algorithms to analyze transactional data in real-time, flagging anomalies and potential risks before they escalate. As a result, businesses are able to mitigate financial losses, safeguard sensitive information, and uphold their reputational integrity in an increasingly complex regulatory landscape.

The widespread adoption of KDD solutions is also being driven by the growing demand for personalized customer experiences and predictive analytics. In highly competitive markets such as retail, e-commerce, and telecommunications, organizations are leveraging KDD to analyze customer behavior, preferences, and purchasing patterns. This enables them to tailor their offerings, optimize marketing strategies, and enhance customer engagement. The ability to anticipate market trends, forecast demand, and identify emerging opportunities is proving invaluable for businesses seeking to maintain a competitive edge. Additionally, the shift towards cloud-based KDD solutions is making advanced analytics accessible to small and medium enterprises, democratizing the benefits of knowledge discovery and leveling the playing field.

From a regional perspective, North America continues to dominate the Knowledge Discovery in Databases market, accounting for the largest share in 2024. This leadership can be attributed to the strong presence of technology giants, advanced IT infrastructure, and early adoption of analytics solutions across key industries. However, the Asia Pacific region is emerging as the fastest-growing market, driven by rapid digitization, government initiatives promoting data-driven innovation, and the proliferation of SMEs embracing cloud-based KDD platforms. Europe also represents a significant market, characterized by stringent data protection regulations and a focus on industrial automation. Meanwhile, Latin America and the Middle East & Africa are witnessing steady growth, supported by increasing investments in digital infrastructure and a growing recognition of the value of data analytics.

Component Analysis

The component segment of the Knowledge Discovery in Databases market is categorized into software, services, and platforms, each playing a pivotal role in the
d
Replication Data for: \"Unraveling spatial, structural, and social...
search.dataone.org
dataverse.harvard.edu
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. http://doi.org/10.7910/DVN/SFT3RT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SFT3RT
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
Description
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
Additional file 1 of Learning from biomedical linked data to suggest valid...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 1 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SPARQL query example 1. This text file contains the SPARQL query we apply on our PGx linked data to obtain the data graph represented in Fig. 3. This query includes the definition of prefixes mentioned in Figs. 2 and 3. This query takes about 30 s on our https://pgxlod.loria.fr server. (TXT 2 kb)
Additional file 2 of Learning from biomedical linked data to suggest valid...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 2 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SPARQL query example 2. This text file contains an example of SPARQL query that enable to explore the vicinity of an entity. This particular query returns the RDF graph surrounding, within a lenght of 4, the node pharmgkb:PA451906 that represents the warfarin, an anticoagulant drug. (TXT 392 bytes)
d
Data from: Towards open data blockchain analytics: a Bitcoin perspective
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan McGinn; Douglas McIlwraith; Yike Guo (2025). Towards open data blockchain analytics: a Bitcoin perspective [Dataset]. http://doi.org/10.5061/dryad.h9r0p65
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.h9r0p65
Dataset updated
Jun 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Dan McGinn; Douglas McIlwraith; Yike Guo
Time period covered
Jul 9, 2018
Description
Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...
Z
Data Analysis for the Systematic Literature Review of DL4SE
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
College of William and Mary
Washington and Lee University
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Data from: Evaluation of classification techniques for identifying fake...
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda (2023). Evaluation of classification techniques for identifying fake reviews about products and services on the internet [Dataset]. http://doi.org/10.6084/m9.figshare.14283143.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14283143.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.
D
Knowledge Discovery in Biological Databases for Revealing Candidate Genes...
ckan.grassroots.tools
html, pdf
Updated Aug 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rothamsted Research (2019). Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes [Dataset]. https://ckan.grassroots.tools/bg/dataset/bf47bbcd-d26b-40a1-a86b-144f37570967
Explore at:
pdf, htmlAvailable download formats
Dataset updated
Aug 7, 2019
Dataset provided by
Rothamsted Research
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Description
jats:titleAbstract/jats:title jats:pGenetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future./jats:p
m
Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...
data.mendeley.com
Updated Feb 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minxing Si (2021). Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands Database for Knowledge Discovery and Carbon Cost Analysis [Dataset]. http://doi.org/10.17632/8ngkgz69zb.4
Explore at:
Unique identifier
https://doi.org/10.17632/8ngkgz69zb.4
Dataset updated
Feb 10, 2021
Authors
Minxing Si
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
A better understanding of greenhouse gas (GHG) emissions resulting from oil sands (bitumen) extraction can help to meet global oil demands, identify potential mitigation measures, and design effective carbon policies. While several studies have attempted to model GHG emissions from oil sands extractions, these studies have encountered data availability challenges, particularly with respect to actual fuel use data, and have thus struggled to accurately quantify GHG emissions. This dataset contains actual operational data from 20 in-situ oil sands operations, including information for fuel gas, flare gas, vented gas, production, steam injection, gas injection, condensate injection, and C3 injection.
f
Data from: Historical Data Mining Deep Dive into Machine Learning-Aided 2D...
acs.figshare.com
figshare.com
xlsx
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krittapong Deshsorn; Panwad Chavalekvirat; Somrudee Deepaisarn; Ho-Chiao Chuang; Pawin Iamprasertkun (2025). Historical Data Mining Deep Dive into Machine Learning-Aided 2D Materials Research in Electrochemical Applications [Dataset]. http://doi.org/10.1021/acsmaterialsau.5c00030.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acsmaterialsau.5c00030.s001
Dataset updated
Jun 23, 2025
Dataset provided by
ACS Publications
Authors
Krittapong Deshsorn; Panwad Chavalekvirat; Somrudee Deepaisarn; Ho-Chiao Chuang; Pawin Iamprasertkun
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Machine learning transforms the landscape of 2D materials design, particularly in accelerating discovery, optimization, and screening processes. This review has delved into the historical and ongoing integration of machine learning in 2D materials for electrochemical energy applications, using the Knowledge Discovery in Databases (KDD) approach to guide the research through data mining from the Scopus database using analysis of citations, keywords, and trends. The topics will first focus on a “macro” scope, where hundreds of literature reports are computer analyzed for key insights, such as year analysis, publication origin, and word co-occurrence using heat maps and network graphs. Afterward, the focus will be narrowed down into a more specific “micro” scope obtained from the “macro” overview, which is intended to dive deep into machine learning usage. From the gathered insights, this work highlights how machine learning, density functional theory (DFT), and traditional experimentation are jointly advancing the field of materials science. Overall, the resulting review offers a comprehensive analysis, touching on essential applications such as batteries, fuel cells, supercapacitors, and synthesis processes while showcasing machine learning techniques that enhance the identification of critical material properties.
m
Portuguese Examples (Semantic Migration)
data.mendeley.com
Updated Jul 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dora Melo (2022). Portuguese Examples (Semantic Migration) [Dataset]. http://doi.org/10.17632/t2cx9stwfb.1
Explore at:
Unique identifier
https://doi.org/10.17632/t2cx9stwfb.1
Dataset updated
Jul 14, 2022
Authors
Dora Melo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An excerpt of the CIDOC-CRM Ontology Representation of the DigitArq records from Bragança District Archive. The dataset also includes two SPARQL query examples - "What are the locals and their parishes located in the county 'Bragança' between 1900 and 1910?" and "What is the number of children per couple, between 1800 and 1850?", to facilitate the ontology exploration. This dataset is part of the results obtained from the semantic migration process of DigitArq - Portuguese Archive Database - metadata into CIDOC-CRM Ontology representation. This work is done in the context of the R&D EPISA project (Entity and Property Inference for Semantic Archives), a research project financed by National Funds through the Portuguese funding agency, FCT (Fundação para a Ciência e a Tecnologia) - DSAIPA/DS/0023/2018.
Additional file 3 of The Aliment to Bodily Condition knowledgebase (ABCkb):...
springernature.figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Trautman; Richard Linchangco; Rachel Walstead; Jeremy J. Jay; Cory Brouwer (2023). Additional file 3 of The Aliment to Bodily Condition knowledgebase (ABCkb): a database connecting plants and human health [Dataset]. http://doi.org/10.6084/m9.figshare.17087941.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17087941.v1
Dataset updated
Jun 8, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aaron Trautman; Richard Linchangco; Rachel Walstead; Jeremy J. Jay; Cory Brouwer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3: File S3. Query node and relationship information. The query from the application portion from Avena sativa to Heart Disease and Diabetes resulted in the nodes and relationships as previously discussed. This file contains the more detailed information contained in the Neo4j database about the nodes and the relationship connections.
kdd cyberattack
kaggle.com
zip
Updated Jul 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziyad Mestour (2018). kdd cyberattack [Dataset]. https://www.kaggle.com/slashtea/kdd-cyberattack
Explore at:
zip(2298343 bytes)Available download formats
Dataset updated
Jul 28, 2018
Authors
Ziyad Mestour
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

Content

For more information about the contents refer to this link http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Acknowledgements

The dataset is shared on Kaggle on behalf of KDD's work.

Inspiration

Build a classifier capable of distinguishing between attacks, and normal connections
e
CODE dataset
data.europa.eu
figshare.scilifelab.se
+1more
unknown
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uppsala universitet (2021). CODE dataset [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-17044-scilifelab-15169716?locale=ga
Explore at:
unknownAvailable download formats
Dataset updated
Nov 22, 2021
Dataset authored and provided by
Uppsala universitet
Description
Dataset with annotated 12-lead ECG records. The exams were taken in 811 counties in the state of Minas Gerais/Brazil by the Telehealth Network of Minas Gerais (TNMG) between 2010 and 2016. And organized by the CODE (Clinical outcomes in digital electrocardiography) group. Requesting access Researchers affiliated to educational or research institutions might make requests to access this data dataset. Requests will be analyzed on an individual basis and should contain: Name of PI and host organisation; Contact details (including your name and email); and, the scientific purpose of data access request. If approved, a data user agreement will be forwarded to the researcher that made the request (through the email that was provided). After the agreement has been signed (by the researcher or by the research institution) access to the dataset will be granted. Openly available subset: A subset of this dataset (with 15% of the patients) is openly available. See: "CODE-15%: a large scale annotated dataset of 12-lead ECGs" https://doi.org/10.5281/zenodo.4916206. Content The folder contains: A column separated file containing basic patient attributes. The ECG waveforms in the wfdb format. Additional references The dataset is described in the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4. Related publications also using this dataset are: - [1] G. Paixao et al., “Validation of a Deep Neural Network Electrocardiographic-Age as a Mortality Predictor: The CODE Study,” Circulation, vol. 142, no. Suppl_3, pp. A16883–A16883, Nov. 2020, doi: 10.1161/circ.142.suppl_3.16883.- [2] A. L. P. Ribeiro et al., “Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/gf7pwg.- [3] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. P. Ribeiro, and W. Meira Jr, “Explaining end-to-end ECG automated diagnosis using contextual features,” in Machine Learning and Knowledge Discovery in Databases. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Ghent, Belgium, Sep. 2020, vol. 12461, pp. 204--219. doi: 10.1007/978-3-030-67670-4_13.- [4] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. Ribeiro, and W. M. Jr, “Explaining black-box automated electrocardiogram classiﬁcation to cardiologists,” in 2020 Computing in Cardiology (CinC), 2020, vol. 47. doi: 10.22489/CinC.2020.452.- [5] G. M. M. Paixão et al., “Evaluation of mortality in bundle branch block patients from an electronic cohort: Clinical Outcomes in Digital Electrocardiography (CODE) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/dcgk.- [6] G. M. M. Paixão et al., “Evaluation of Mortality in Atrial Fibrillation: Clinical Outcomes in Digital Electrocardiography (CODE) Study,” Global Heart, vol. 15, no. 1, p. 48, Jul. 2020, doi: 10.5334/gh.772.- [7] G. M. M. Paixão et al., “Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients,” Hearts, vol. 2, no. 4, Art. no. 4, Dec. 2021, doi: 10.3390/hearts2040035.- [8] G. M. Paixão et al., “ECG-AGE FROM ARTIFICIAL INTELLIGENCE: A NEW PREDICTOR FOR MORTALITY? THE CODE (CLINICAL OUTCOMES IN DIGITAL ELECTROCARDIOGRAPHY) STUDY,” Journal of the American College of Cardiology, vol. 75, no. 11 Supplement 1, p. 3672, 2020, doi: 10.1016/S0735-1097(20)34299-6.- [9] E. M. Lima et al., “Deep neural network estimated electrocardiographic-age as a mortality predictor,” Nature Communications, vol. 12, 2021, doi: 10.1038/s41467-021-25351-7.- [10] W. Meira Jr, A. L. P. Ribeiro, D. M. Oliveira, and A. H. Ribeiro, “Contextualized Interpretable Machine Learning for Medical Diagnosis,” Communications of the ACM, 2020, doi: 10.1145/3416965.- [11] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, p. 1760, 2020, doi: 10/drkd.- [12] A. H. Ribeiro et al., “Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network,” Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018.- [13] A. H. Ribeiro et al., “Automatic 12-lead ECG classiﬁcation using a convolutional network ensemble,” 2020. doi: 10.22489/CinC.2020.130.- [14] V. Sangha et al., “Automated Multilabel Diagnosis on Electrocardiographic Images and Signals,” medRxiv, Sep. 2021, doi: 10.1101/2021.09.22.21263926.- [15] S. Biton et al., “Atrial fibrillation risk prediction from the 12-lead ECG using digital biomarkers and deep representation learning,” European Heart Journal - Digital Health, 2021, doi: 10.1093/ehjdh/ztab071. Code: The following github repositories perform analysis that use this dataset: - https://github.com/antonior92/automatic-ecg-diagnosis- https://github.com/antonior92/ecg-age-prediction Related Datasets: - CODE-test: An annotated 12-lea
Data from: Identification of patterns for increasing production with...
scielo.figshare.com
jpeg
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paulo Rodrigues Peloia; Felipe Ferreira Bocca; Luiz Henrique Antunes Rodrigues (2023). Identification of patterns for increasing production with decision trees in sugarcane mill data [Dataset]. http://doi.org/10.6084/m9.figshare.7899809.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7899809.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Paulo Rodrigues Peloia; Felipe Ferreira Bocca; Luiz Henrique Antunes Rodrigues
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT: Sugarcane mills in Brazil collect a vast amount of data relating to production on an annual basis. The analysis of this type of database is complex, especially when factors relating to varieties, climate, detailed management techniques, and edaphic conditions are taken into account. The aim of this paper was to perform a decision tree analysis of a detailed database from a production unit and to evaluate the actionable patterns found in terms of their usefulness for increasing production. The decision tree revealed interpretable patterns relating to sugarcane yield (R2 = 0.617), certain of which were actionable and had been previously studied and reported in the literature. Based on two actionable patterns relating to soil chemistry, intervention which will increase production by almost 2 % were suitable for recommendation. The method was successful in reproducing the knowledge of experts of the factors which influence sugarcane yield, and the decision trees can support the decision-making process in the context of production and the formulation of hypotheses for specific experiments.
d
Ramakrishnan: Semantics on the Web
catalog.data.gov
s.cnmilf.com
+3more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Ramakrishnan: Semantics on the Web [Dataset]. https://catalog.data.gov/dataset/ramakrishnan-semantics-on-the-web
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
It is becoming increasingly clear that the next generation of web search and advertising will rely on a deeper understanding of user intent and task modeling, and a correspondingly richer interpretation of content on the web. How we get there, in particular, how we understand web content in richer terms than bags of words and links, is a wide open and fascinating question. I will discuss some of the options here, and look closely at the role that information extraction can play. Speaker Bio Raghu Ramakrishnan is Chief Scientist for Audience and Cloud Computing at Yahoo!, and is a Research Fellow, heading the Community Systems area in Yahoo! Research. He was Professor of Computer Sciences at the University of Wisconsin-Madison, and was founder and CTO of QUIQ, a company that pioneered question-answering communities, powering Ask Jeeves' AnswerPoint as well as customer-support for companies such as Compaq. His research has influenced query optimization in commercial database systems, and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Management Systems" (with Johannes Gehrke). He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment, and has served as editor-in-chief of the Journal of Data Mining and Knowledge Discovery, associate editor of ACM Transactions on Database Systems, and the Database area editor of the Journal of Logic Programming. Ramakrishnan is a Fellow of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE), and has received several awards, including a Distinguished Alumnus Award from IIT Madras, a Packard Foundation Fellowship in Science and Engineering, an NSF Presidential Young Investigator Award, and an ACM SIGMOD Contributions Award.
n
Data from: DB-PABP: a database of polyanion binding proteins
neuinfo.org
rrid.site
+2more
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). DB-PABP: a database of polyanion binding proteins [Dataset]. http://identifiers.org/RRID:SCR_007603/resolver/mentions?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007603 https://identifiers.org/RRID:SCR_007603/resolver/mentions?q=&i=rrid
Dataset updated
Oct 23, 2025
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. DB-PABP is an attempt to document the publicly available experimentally determined polyanion binding proteins (PABPs). The purpose of the database is to provide life scientists who are interested in PA/PABP interactions with a comprehensive data repository, as well as computer scientists with a publicly available dataset to perform knowledge discovery and datamining studies. The database is manually curated. It uses protein annotations from NCBI protein database and literature information is retrieved from PubMed. Whenever applicable, links to NCBI protein database and PubMed are provided so users may access additional information available in these public databases.
S
Crop trait regulating-genes knowledge graph dataset
scidb.cn
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhang dan dan (2025). Crop trait regulating-genes knowledge graph dataset [Dataset]. http://doi.org/10.57760/sciencedb.agriculture.00175
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.agriculture.00175
Dataset updated
Jan 3, 2025
Dataset provided by
Science Data Bank
Authors
zhang dan dan
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
In the scientific research of crop breeding, breeding new crop varieties with various excellent traits has always been the direction of efforts of breeders. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has shown exponential growth. These semi-structured and structured scientific data are distributed in scientific databases in different fields and lack the association and fusion of multi-dimensional scientific data across species. It hindered the transfer and reuse of existing crop breeding knowledge and maximized the value of crop breeding scientific data, which brought challenges to the knowledge discovery of crop trait regulation genes. Therefore, more and more crop breeding research work is based on the reorganization, correlation, analysis and utilization of existing breeding scientific data, so as to achieve the discovery of crop trait regulation gene knowledge.The dataset of knowledge map of crop trait regulatory genes was selected from PubMed literature database, Phytozome (genomic information of 4 species) and Ensembl (European Molecular Biology Laboratory's European) Bioinformatics Institute (Bioinformatics Institute) plants (Genome information of 4 species), UniProt (Universal Protein) (protein Annotation information of 4 species), Rice Genome Annotation (RGAP) Project), STRING (protein interaction information for 4 species), Pfam (Protein family analysis and modeling) (protein family information for 4 species), KEGG (Kyoto Encyclopedia of Genes) The entities and relationships of the multi-source scientific data with different data formats were extracted using the and Genomes (pathway annotation information of the 4 species) and the GO (Gene Ontology) domain scientific database as the data sources. It mainly includes mapping knowledge extraction for structured data. For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language model is adopted. On the basis of the above entity and relationship extraction, the association fusion of multi-source crop breeding knowledge was realized based on entity mapping and specific attribute association. Finally, the crop trait regulatory gene knowledge map dataset was formed, which consisted of 13 entity datasets and 16 entity relationship datasets.The crop trait -egulating gene knowledge graph dataset provides a key semantic model and important data basis for crop breeding knowledge discovery, such as excellent pleiotropic gene discovery, cross-species gene function prediction and potential discovery of pathway gene network.

Facebook

Twitter

Click to copy link

Link copied

Cite

Growth Market Reports (2025). Knowledge Discovery in Databases Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/knowledge-discovery-in-databases-market

Knowledge Discovery in Databases Market Research Report 2033

Explore at:

pptx, csv, pdfAvailable download formats

Dataset updated

Aug 22, 2025

Dataset authored and provided by

Growth Market Reports

Time period covered

2024 - 2032

Area covered

Global

Description

Knowledge Discovery in Databases Market Outlook

According to our latest research, the global Knowledge Discovery in Databases (KDD) market size reached USD 8.7 billion in 2024, driven by the exponential growth of data across industries and increasing demand for advanced analytics solutions. The market is experiencing a robust expansion, registering a CAGR of 18.5% during the forecast period. By 2033, the Knowledge Discovery in Databases market is projected to attain a value of USD 44.9 billion. This remarkable growth is primarily attributed to the rising adoption of artificial intelligence (AI), machine learning (ML), and big data analytics, which are transforming how organizations extract actionable insights from vast and complex datasets.

The surge in data generation from digital transformation initiatives, IoT devices, and cloud-based applications is a major growth driver for the Knowledge Discovery in Databases market. As organizations increasingly digitize their operations and customer interactions, the volume, variety, and velocity of data have soared, making traditional data analysis methods insufficient. KDD platforms and solutions are essential for uncovering hidden patterns, correlations, and trends within large datasets, enabling businesses to make data-driven decisions and gain a competitive edge. Furthermore, the proliferation of unstructured data from sources such as social media, emails, and multimedia content has heightened the need for advanced mining techniques, further fueling market growth.

Another significant factor propelling the Knowledge Discovery in Databases market is the integration of AI and ML technologies into KDD solutions. These intelligent algorithms enhance the automation, accuracy, and scalability of data mining processes, allowing organizations to extract deeper insights in real time. The increasing availability of cloud-based KDD solutions has democratized access to advanced analytics, enabling small and medium enterprises (SMEs) to leverage sophisticated tools without the need for extensive infrastructure investments. Additionally, the growing emphasis on regulatory compliance, risk management, and fraud detection in sectors such as BFSI and healthcare is driving the adoption of KDD technologies to ensure data integrity and security.

The evolving landscape of digital businesses and the rising importance of customer-centric strategies have also contributed to the expansion of the Knowledge Discovery in Databases market. Enterprises across retail, telecommunications, and manufacturing are harnessing KDD tools to personalize offerings, optimize supply chains, and enhance operational efficiency. The ability of KDD platforms to handle diverse data types, including text, images, and video, has broadened their applicability across various domains. Moreover, the increasing focus on predictive analytics and real-time decision-making is encouraging organizations to invest in KDD solutions that provide timely and actionable insights, thereby driving sustained market growth through 2033.

From a regional perspective, North America continues to dominate the Knowledge Discovery in Databases market, supported by the presence of leading technology vendors, high digital adoption rates, and substantial investments in AI and analytics infrastructure. However, the Asia Pacific region is witnessing the fastest growth, propelled by rapid digitalization, expanding IT ecosystems, and government initiatives promoting data-driven innovation. Europe remains a significant market, characterized by strong regulatory frameworks and a focus on data privacy and security. Latin America and the Middle East & Africa are also emerging as promising markets, driven by increasing awareness of the benefits of KDD and growing investments in digital transformation across industries.

Component Analysis

The Knowledge Discovery in Databases market is segmented by component into Software, Services, and Platforms, each playing a crucial role in the overall ecosystem. Software solutions form the backbone of the KDD ma

Clear search

Close search

Google apps

Main menu

Knowledge Discovery in Databases Market Research Report 2033

Knowledge Discovery in Databases Market Outlook

Component Analysis

Data from: Results obtained in a data mining process applied to a database...

Knowledge Discovery In Databases Market Research Report 2033

Knowledge Discovery in Databases (KDD) Market Outlook

Component Analysis

Replication Data for: \"Unraveling spatial, structural, and social...

Additional file 1 of Learning from biomedical linked data to suggest valid...

Additional file 2 of Learning from biomedical linked data to suggest valid...

Data from: Towards open data blockchain analytics: a Bitcoin perspective

Data Analysis for the Systematic Literature Review of DL4SE

Data from: Evaluation of classification techniques for identifying fake...

Knowledge Discovery in Biological Databases for Revealing Candidate Genes...

Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...

Data from: Historical Data Mining Deep Dive into Machine Learning-Aided 2D...

Portuguese Examples (Semantic Migration)

Additional file 3 of The Aliment to Bodily Condition knowledgebase (ABCkb):...

kdd cyberattack

Context

Content

Acknowledgements

Inspiration

CODE dataset

Data from: Identification of patterns for increasing production with...

Ramakrishnan: Semantics on the Web

Data from: DB-PABP: a database of polyanion binding proteins

Crop trait regulating-genes knowledge graph dataset

Knowledge Discovery in Databases Market Research Report 2033

Knowledge Discovery in Databases Market Outlook

Component Analysis