100+ datasets found

f
Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
d
Data from: Data Mining at NASA: From Theory to Applications
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.
D
Data Mining Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Mining Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-mining-software-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jan 7, 2025
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Software Market Outlook

The global data mining software market size was valued at USD 7.2 billion in 2023 and is projected to reach USD 15.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.7% during the forecast period. This growth is driven primarily by the increasing adoption of big data analytics and the rising demand for business intelligence across various industries. As businesses increasingly recognize the value of data-driven decision-making, the market is expected to witness substantial growth.

One of the significant growth factors for the data mining software market is the exponential increase in data generation. With the proliferation of internet-enabled devices and the rapid advancement of technologies such as the Internet of Things (IoT), there is a massive influx of data. Organizations are now more focused than ever on harnessing this data to gain insights, improve operations, and create a competitive advantage. This has led to a surge in demand for advanced data mining tools that can process and analyze large datasets efficiently.

Another driving force is the growing need for personalized customer experiences. In industries such as retail, healthcare, and BFSI, understanding customer behavior and preferences is crucial. Data mining software enables organizations to analyze customer data, segment their audience, and deliver personalized offerings, ultimately enhancing customer satisfaction and loyalty. This drive towards personalization is further fueling the adoption of data mining solutions, contributing significantly to market growth.

The integration of artificial intelligence (AI) and machine learning (ML) technologies with data mining software is also a key growth factor. These advanced technologies enhance the capabilities of data mining tools by enabling them to learn from data patterns and make more accurate predictions. The convergence of AI and data mining is opening new avenues for businesses, allowing them to automate complex tasks, predict market trends, and make informed decisions more swiftly. The continuous advancements in AI and ML are expected to propel the data mining software market over the forecast period.

Regionally, North America holds a significant share of the data mining software market, driven by the presence of major technology companies and the early adoption of advanced analytics solutions. The Asia Pacific region is also expected to witness substantial growth due to the rapid digital transformation across various industries and the increasing investments in data infrastructure. Additionally, the growing awareness and implementation of data-driven strategies in emerging economies are contributing to the market expansion in this region.

Text Mining Software is becoming an integral part of the data mining landscape, offering unique capabilities to analyze unstructured data. As organizations generate vast amounts of textual data from various sources such as social media, emails, and customer feedback, the need for specialized tools to extract meaningful insights is growing. Text Mining Software enables businesses to process and analyze this data, uncovering patterns and trends that were previously hidden. This capability is particularly valuable in industries like marketing, customer service, and research, where understanding the nuances of language can lead to more informed decision-making. The integration of text mining with traditional data mining processes is enhancing the overall analytical capabilities of organizations, allowing them to derive comprehensive insights from both structured and unstructured data.

Component Analysis

The data mining software market is segmented by components, which primarily include software and services. The software segment encompasses various types of data mining tools that are used for analyzing and extracting valuable insights from raw data. These tools are designed to handle large volumes of data and provide advanced functionalities such as predictive analytics, data visualization, and pattern recognition. The increasing demand for sophisticated data analysis tools is driving the growth of the software segment. Enterprises are investing in these tools to enhance their data processing capabilities and derive actionable insights.

Within the software segment, the emergence of cloud-based data mining solutions is a notable trend. Cloud-based solutions offer several advantages, including s
d
Privacy Preserving Distributed Data Mining
catalog.data.gov
datadiscoverystudio.org
+2more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
d
Data from: Discovering System Health Anomalies using Data Mining Techniques
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
v
Data Mining Tools Market Size, Share & Growth Report, 2033
valuemarketresearch.com
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Value Market Research (2024). Data Mining Tools Market Size, Share & Growth Report, 2033 [Dataset]. https://www.valuemarketresearch.com/report/data-mining-tools-market
Explore at:
electronic (pdf), ms excelAvailable download formats
Dataset updated
Jan 24, 2024
Dataset authored and provided by
Value Market Research
License
https://www.valuemarketresearch.com/privacy-policyhttps://www.valuemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Description
The forecast for the global Data Mining Tools market predicts substantial growth, with market size projected to soar to USD 5.08 Billion by 2033, a significant increase from the USD 1.78 Billion recorded in 2024. This expansion reflects an impressive compound annual growth rate (CAGR) of 12.32% anticipated between 2025 and 2033.

The Global Data Mining Tools market size to cross USD 5.08 Billion b
D
Data Mining Tools Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Mining Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-tools-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Apr 1, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Tools Market Outlook 2032

The global data mining tools market size was USD 932 Million in 2023 and is projected to reach USD 2,584.7 Million by 2032, expanding at a CAGR of 12% during 2024–2032. The market is fueled by the rising demand for big data analytics across various industries and the increasing need for AI-integrated data mining tools for insightful decision-making.

Increasing adoption of cloud-based platforms in data mining tools fuels the market. This enhances scalability, flexibility, and cost-efficiency in data handling processes. Major tech companies are launching cloud-based data mining solutions, enabling businesses to analyze vast datasets effectively. This trend reflects the shift toward agile and scalable data analysis methods, meeting the dynamic needs of modern enterprises.

In July 2023, Microsoft launched Power Automate Process Mining. This tool, powered by advanced AI, allows companies to gain deep insights into their operations, streamline processes, and foster ongoing improvement through automation and low-code applications, marking a new era in business efficiency and process optimization.

Rising focus on predictive analytics propels the development of advanced data mining tools capable of forecasting future trends and behaviors. Industries such as finance, healthcare, and retail invest significantly in predictive analytics to gain a competitive edge, driving demand for sophisticated data mining technologies. This trend underscores the strategic importance of foresight in decision-making processes.

Visual data mining tools are gaining traction in the market, offering intuitive data exploration and interpretation capabilities. These tools enable users to uncover patterns and insights through graphical representations, making data analysis accessible to a broader audience. The launch of user-friendly visual data mining applications marks a significant step toward democratizing data analytics.

Impact of Artificial Intelligence (
S
Predictive data analysis techniques for higher education students dropout
scidb.cn
Updated Apr 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cindy (2023). Predictive data analysis techniques for higher education students dropout [Dataset]. http://doi.org/10.57760/sciencedb.07894
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.07894
Dataset updated
Apr 10, 2023
Dataset provided by
Science Data Bank
Authors
Cindy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this research, we have generated student retention alerts. The alerts are classified into two types: preventive and corrective. This classification varies according to the level of maturity of the data systematization process. Therefore, to systematize the data, data mining techniques have been applied. The experimental analytical method has been used, with a population of 13,715 students with 62 sociological, academic, family, personal, economic, psychological, and institutional variables, and factors such as academic follow-up and performance, financial situation, and personal information. In particular, information is collected on each of the problems or a combination of problems that could affect dropout rates. Following the methodology, the information has been generated through an abstract data model to reflect the profile of the dropout student. As advancement from previous research, this proposal will create preventive and corrective alternatives to avoid dropout higher education. Also, in contrast to previous work, we generated corrective warnings with the application of data mining techniques such as neural networks until reaching a precision of 97% and losses of 0.1052. In conclusion, this study pretends to analyze the behavior of students who drop out the university through the evaluation of predictive patterns. The overall objective is to predict the profile of student dropout, considering reasons such as admission to higher education and career changes. Consequently, using a data systematization process promotes the permanence of students in higher education. Once the profile of the dropout has been identified, student retention strategies have been approached, according to the time of its appearance and the point of view of the institution.
f
Data_Sheet_1_Process Mining of Football Event Data: A Novel Approach for...
frontiersin.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavlina Kröckel; Freimut Bodendorf (2023). Data_Sheet_1_Process Mining of Football Event Data: A Novel Approach for Tactical Insights Into the Game.docx [Dataset]. http://doi.org/10.3389/frai.2020.00047.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2020.00047.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Pavlina Kröckel; Freimut Bodendorf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The paper explores process mining and its usefulness for analyzing football event data. We work with professional event data provided by OPTA Sports from the European Championship in 2016. We analyze one game of a favorite team (England) against an underdog team (Iceland). The success of the underdog teams in the Euro 2016 was remarkable, and it is what made the event special. For this reason, it is interesting to compare the performance of a favorite and an underdog team by applying process mining. The goal is to show the options that these types of algorithms and visual analytics offer for the interpretation of event data in football and discuss how the gained insights can support decision makers not only in pre- and post-match analysis but also during live games as well. We show process mining techniques which can be used to gain team or individual player insights by considering the types of actions, the sequence of actions, and the order of player involvement in each sequence. Finally, we also demonstrate the detection of typical or unusual behavior by trace and sequence clustering.
Privacy‑Preserving Data Mining Tools Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Privacy‑Preserving Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/privacypreserving-data-mining-tools-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jun 27, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Privacy‑Preserving Data Mining Tools Market Outlook

According to our latest research, the global Privacy‑Preserving Data Mining Tools market size reached USD 1.42 billion in 2024, reflecting robust adoption across diverse industries. The market is expected to exhibit a CAGR of 22.8% during the forecast period, propelling the market to USD 10.98 billion by 2033. This remarkable growth is driven by the increasing need for secure data analytics, stringent data protection regulations, and the rising frequency of data breaches, all of which are pushing organizations to adopt advanced privacy solutions.

One of the primary growth factors for the Privacy‑Preserving Data Mining Tools market is the exponential rise in data generation and the parallel escalation of privacy concerns. As organizations collect vast amounts of sensitive information, especially in sectors like healthcare and BFSI, the risk of data exposure and misuse grows. Governments worldwide are enacting stricter data protection laws, such as the GDPR in Europe and CCPA in California, compelling enterprises to integrate privacy‑preserving technologies into their analytics workflows. These regulations not only mandate compliance but also foster consumer trust, making privacy‑preserving data mining tools a strategic investment for businesses aiming to maintain a competitive edge while safeguarding user data.

Another significant driver is the rapid digital transformation across industries, which necessitates the extraction of actionable insights from large, distributed data sets without compromising privacy. Privacy‑preserving techniques, such as federated learning, homomorphic encryption, and differential privacy, are gaining traction as they allow organizations to collaborate and analyze data securely. The advent of cloud computing and the proliferation of connected devices further amplify the demand for scalable and secure data mining solutions. As enterprises embrace cloud-based analytics, the need for robust privacy-preserving mechanisms becomes paramount, fueling the adoption of advanced tools that can operate seamlessly in both on-premises and cloud environments.

Moreover, the increasing sophistication of cyber threats and the growing awareness of the potential reputational and financial damage caused by data breaches are prompting organizations to prioritize data privacy. High-profile security incidents have underscored the vulnerabilities inherent in traditional data mining approaches, accelerating the shift towards privacy-preserving alternatives. The integration of artificial intelligence and machine learning with privacy-preserving technologies is also opening new avenues for innovation, enabling more granular and context-aware data analytics. This technological convergence is expected to further catalyze market growth, as organizations seek to harness the full potential of their data assets while maintaining stringent privacy standards.

From a regional perspective, North America currently commands the largest share of the Privacy‑Preserving Data Mining Tools market, driven by the presence of leading technology vendors, high awareness levels, and a robust regulatory framework. Europe follows closely, propelled by stringent data privacy laws and increasing investments in secure analytics infrastructure. The Asia Pacific region is witnessing the fastest growth, fueled by rapid digitalization, expanding IT ecosystems, and rising cybersecurity concerns in emerging economies such as China and India. Latin America and the Middle East & Africa are also experiencing steady growth, albeit from a smaller base, as organizations in these regions increasingly recognize the importance of privacy in data-driven decision-making.

Component Analysis

The Privacy‑Preserving Data Mining Tools market is segmented by component into software and services, each playing a pivotal role in shaping the industry landscape. The software segment dominates the market, accounting for the majority of revenue in 2024. Organizations are increasingly investing in advanced software so
d
Data from: Community-Scale Attic Retrofit and Home Energy Upgrade Data...
catalog.data.gov
data.openei.org
+3more
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davis Energy (2023). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. https://catalog.data.gov/dataset/community-scale-attic-retrofit-and-home-energy-upgrade-data-mining-hot-dry-climate
Explore at:
Dataset updated
Nov 2, 2023
Dataset provided by
Davis Energy
Description
Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking. The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs. ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis. This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.
c
Global Data Mining Software Market Report 2025 Edition, Market Size, Share,...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). Global Data Mining Software Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/data-mining-software-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
May 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Data Mining Software market size will be USD XX million in 2025. It will expand at a compound annual growth rate (CAGR) of XX% from 2025 to 2031.

North America held the major market share for more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Europe accounted for a market share of over XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Asia Pacific held a market share of around XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Latin America had a market share of more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Middle East and Africa had a market share of around XX% of the global revenue and was estimated at a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. KEY DRIVERS

Increasing Focus on Customer Satisfaction to Drive Data Mining Software Market Growth

In today’s hyper-competitive and digitally connected marketplace, customer satisfaction has emerged as a critical factor for business sustainability and growth. The growing focus on enhancing customer satisfaction is proving to be a significant driver in the expansion of the data mining software market. Organizations are increasingly leveraging data mining tools to sift through vast volumes of customer data—ranging from transactional records and website activity to social media engagement and call center logs—to uncover insights that directly influence customer experience strategies. Data mining software empowers companies to analyze customer behavior patterns, identify dissatisfaction triggers, and predict future preferences. Through techniques such as classification, clustering, and association rule mining, businesses can break down large datasets to understand what customers want, what they are likely to purchase next, and how they feel about the brand. These insights not only help in refining customer service but also in shaping product development, pricing strategies, and promotional campaigns. For instance, Netflix uses data mining to recommend personalized content by analyzing a user's viewing history, ratings, and preferences. This has led to increased user engagement and retention, highlighting how a deep understanding of customer preferences—made possible through data mining—can translate into competitive advantage. Moreover, companies are increasingly using these tools to create highly targeted and customer-specific marketing campaigns. By mining data from e-commerce transactions, browsing behavior, and demographic profiles, brands can tailor their offerings and communications to suit individual customer segments. For Instance Amazon continuously mines customer purchasing and browsing data to deliver personalized product recommendations, tailored promotions, and timely follow-ups. This not only enhances customer satisfaction but also significantly boosts conversion rates and average order value. According to a report by McKinsey, personalization can deliver five to eight times the ROI on marketing spend and lift sales by 10% or more—a powerful incentive for companies to adopt data mining software as part of their customer experience toolkit. (Source: https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/personalizing-at-scale#/) The utility of data mining tools extends beyond e-commerce and streaming platforms. In the banking and financial services industry, for example, institutions use data mining to analyze customer feedback, call center transcripts, and usage data to detect pain points and improve service delivery. Bank of America, for instance, utilizes data mining and predictive analytics to monitor customer interactions and provide proactive service suggestions or fraud alerts, significantly improving user satisfaction and trust. (Source: https://futuredigitalfinance.wbresearch.com/blog/bank-of-americas-erica-client-interactions-future-ai-in-banking) Similarly, telecom companies like Vodafone use data mining to understand customer churn behavior and implement retention strategies based on insights drawn from service usage patterns and complaint histories. In addition to p...
o
Data from: Identifying Missing Data Handling Methods with Text Mining
openicpsr.org
delimited
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185961V1
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2016
Description
Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.
Z
Data mining and sentiment analysis on Twitter and Facebook
data.niaid.nih.gov
zenodo.org
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chevalley, Coralie (2023). Data mining and sentiment analysis on Twitter and Facebook [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7979377
Explore at:
Dataset updated
Jun 1, 2023
Dataset provided by
Chevalley, Coralie
Halili, Admira
Covaleda, Felipe
Tavares Da Costa, Erisvelton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Les données récoltées sont sur le sujet "Data mining and sentiment analysis on Twitter and Facebook". Ce jeu de donnée contient la liste des attributs principaux suivants :

titles, titre du fichier PDF,

authors, auteurs du fichier PDF,

years, année de création du fichier PDF,

ncitedby, nombre de citation,

linkfiles, liens du fichier PDF,

mais également des métadonnées.

La récupération du jeu de données a été récolté sur Google Scholar. Plusieurs recherches sur Google Scholar ont été faites pour ce dernier (voir liens ci-dessous) :

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=twitter+data+mining+filetype%3Apdf&btnG=

https://scholar.google.com/scholar?start=490&q=facebook+data+mining+-Twitter+filetype:pdf&hl=en&as_sdt=0,5

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=seniment+analyse+twitter+filetype%3Apdf&btnG=
d
Data from: Privacy Preserving Outlier Detection through Random Nonlinear...
catalog.data.gov
data.amerigeoss.org
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.
L
Life Sciences Data Mining and Visualization Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Life Sciences Data Mining and Visualization Software Report [Dataset]. https://www.marketresearchforecast.com/reports/life-sciences-data-mining-and-visualization-software-542790
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Life Sciences Data Mining and Visualization Software market is experiencing robust growth, driven by the increasing volume of biological data generated through genomics, proteomics, and clinical trials. The market's expansion is fueled by the urgent need for efficient tools to analyze this complex data, enabling faster drug discovery, personalized medicine initiatives, and improved patient outcomes. Companies are increasingly investing in advanced analytics solutions to gain actionable insights from their data, leading to improved operational efficiency and reduced research and development costs. The integration of artificial intelligence (AI) and machine learning (ML) capabilities within these software solutions is a significant trend, enhancing the ability to identify patterns and make predictions from large datasets. This market is segmented by software type (e.g., data mining, visualization, integrated solutions), deployment mode (cloud, on-premise), and end-user (pharmaceutical companies, biotechnology firms, research institutions). Competition is fierce, with established players like IBM, Microsoft, and SAS competing with specialized life sciences focused companies and emerging innovative startups. While the market faces challenges such as the high cost of implementation and the need for specialized expertise, the long-term prospects remain positive. The continuous advancements in data generation technologies and the growing demand for data-driven decision-making in the life sciences sector will continue to fuel market growth. Furthermore, the increasing adoption of cloud-based solutions is expected to lower the barrier to entry for smaller companies and research institutions, further expanding the market. This makes the Life Sciences Data Mining and Visualization Software market a particularly attractive investment opportunity with high potential for both established players and new entrants. The market's estimated size in 2025 is $10 Billion, with a projected Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033.
Data from: An Empirical Study of Activity, Popularity, Size, Testing, and...
zenodo.org
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aakash Gautam; Saket Vishwasrao; Francisco Servant; Aakash Gautam; Saket Vishwasrao; Francisco Servant (2020). An Empirical Study of Activity, Popularity, Size, Testing, and Stability in Continuous Integration [Dataset]. http://doi.org/10.5281/zenodo.439362
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.439362
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Aakash Gautam; Saket Vishwasrao; Francisco Servant; Aakash Gautam; Saket Vishwasrao; Francisco Servant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A good understanding of the practices followed by software development projects can positively impact their success --- particularly for attracting talent and on-boarding new members. In this paper, we perform a cluster analysis to classify software projects that follow continuous integration in terms of their activity, popularity, size, testing, and stability. Based on this analysis, we identify and discuss four different groups of repositories that have distinct characteristics that separates them from the other groups. With this new understanding, we encourage open source projects to acknowledge and advertise their preferences according to these defining characteristics, so that they can recruit developers who share similar values.
Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...
zenodo.org
bin, png, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
Explore at:
bin, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7778291
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Technical notes and documentation on the common data model of the project CONCEPT-DM2.

This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

Aims of the CONCEPT-DM2 project:

General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

Main specific aims:

To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations

To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines

To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.

To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)

Exclusion criteria:

patients with no contact with the health system from 01/01/2017 to 31/12/2022

patients that had a T1D (DM1) code opened after the T2D code during the follow-up.

Study period. From 01/01/2017 to 31/12/2022

Files included in this publication:

Datamodel_CONCEPT_DM2_diagram.png

Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)

Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)

sample_data1_dm_patient.csv

sample_data2_dm_param.csv

sample_data3_dm_patient.csv

sample_data4_dm_param.csv

sample_data5_dm_patient.csv

sample_data6_dm_param.csv

sample_data7_dm_param.csv

sample_data8_dm_param.csv

Datamodel_CONCEPT_DM2_explanation.pptx
v
Global Data Warehouse Market Size By Offering Type (ETL Solutions,...
verifiedmarketresearch.com
pdf,excel,csv,ppt
Updated Dec 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verified Market Research (2024). Global Data Warehouse Market Size By Offering Type (ETL Solutions, Statistical Analysis, Data Mining), By Deployment Mode (Cloud, On-Premises, Hybrid), By Data Type (Unstructured, Semi-Structured, Structured), By End-User Industry (Banking, Financial Services And Insurance (BFSI), Healthcare, IT And Telecom, Retail, Manufacturing, Government, Media And Entertainment), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/data-warehouse-market/
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Dec 21, 2024
Dataset authored and provided by
Verified Market Research
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Data Warehouse Market size was valued at USD 27.68 Billion in 2024 and is projected to reach USD 63.9 Billion by 2032, growing at a CAGR of 11% from 2026 to 2032.

Key Market Drivers: Increasing Volume of Data Generated across Industries: The exponential expansion of data generation is increasing the demand for robust data warehouse solutions. According to the International Data Corporation (IDC), the global datasphere is expected to increase from 33 zettabytes in 2018 to 175 zettabytes by 2025. This tremendous rise in data volume demands sophisticated data warehousing capabilities to ensure efficient storage, administration, and analysis.

Growing Adoption of Cloud-based Data Warehousing: The shift to cloud-based solutions is a significant driver of the Data Warehouse Market.
f
Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project
tandf.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14798079.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.

Facebook

Twitter

Click to copy link

Link copied

Cite

Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fpsyg.2018.02231.s001

Dataset updated

Jun 7, 2023

Dataset provided by

Frontiers

Authors

Xin Qiao; Hong Jiao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.

Clear search

Close search

Google apps

Main menu

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Data from: Data Mining at NASA: From Theory to Applications

Data Mining Software Market Report | Global Forecast From 2025 To 2033

Data Mining Software Market Outlook

Component Analysis

Privacy Preserving Distributed Data Mining

Data from: Discovering System Health Anomalies using Data Mining Techniques

Data Mining Tools Market Size, Share & Growth Report, 2033

Data Mining Tools Market Report | Global Forecast From 2025 To 2033

Data Mining Tools Market Outlook 2032

Impact of Artificial Intelligence (

Predictive data analysis techniques for higher education students dropout

Data_Sheet_1_Process Mining of Football Event Data: A Novel Approach for...

Privacy‑Preserving Data Mining Tools Market Research Report 2033

Privacy‑Preserving Data Mining Tools Market Outlook

Component Analysis

Data from: Community-Scale Attic Retrofit and Home Energy Upgrade Data...

Global Data Mining Software Market Report 2025 Edition, Market Size, Share,...

Data from: Identifying Missing Data Handling Methods with Text Mining

Data mining and sentiment analysis on Twitter and Facebook

Data from: Privacy Preserving Outlier Detection through Random Nonlinear...

Life Sciences Data Mining and Visualization Software Report

Data from: An Empirical Study of Activity, Popularity, Size, Testing, and...

Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

Global Data Warehouse Market Size By Offering Type (ETL Solutions,...

Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf