100+ datasets found

f
Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
d
Data from: Discovering System Health Anomalies using Data Mining Techniques
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
D
Data Mining Tools Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.
d
Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Peer-to-Peer Data Mining, Privacy Issues, and Games [Dataset]. https://catalog.data.gov/dataset/peer-to-peer-data-mining-privacy-issues-and-games
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.
Ensemble Data Mining Methods - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Ensemble Data Mining Methods - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/ensemble-data-mining-methods
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
w
Data mining methods for knowledge discovery
workwithdata.com
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Data mining methods for knowledge discovery [Dataset]. https://www.workwithdata.com/object/data-mining-methods-for-knowledge-discovery-book-by-krzysztof-j-cios-0000
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Explore Data mining methods for knowledge discovery through data • Key facts: author, publication date, book publisher, book series, book subjects • Real-time news, visualizations and datasets
d
Ensemble Data Mining Methods
catalog-dev.data.gov
data.wu.ac.at
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Ensemble Data Mining Methods [Dataset]. https://catalog-dev.data.gov/dataset/ensemble-data-mining-methods
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
Dashlink
Description
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
i
A novel fusion Python application of data mining techniques to evaluate...
ieee-dataport.org
Updated Jun 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Kayode (2020). A novel fusion Python application of data mining techniques to evaluate airborne magnetic datasets [Dataset]. https://ieee-dataport.org/open-access/novel-fusion-python-application-data-mining-techniques-evaluate-airborne-magnetic
Explore at:
Dataset updated
Jun 8, 2020
Authors
John Kayode
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Depths to the various subsurface anomalies have been the primary interest in all the applications of magnetic methods of geophysical prospection. Depths to the subsurface geologic features of interest are more valuable and superior to all other properties in any correct subsurface geologic structural interpretations.
d
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
catalog.data.gov
datadiscoverystudio.org
+5more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
Lifesciences Data Mining and Visualization Market Report | Global Forecast...
dataintelo.com
csv, pdf, pptx
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Lifesciences Data Mining and Visualization Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-lifesciences-data-mining-and-visualization-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Sep 5, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Lifesciences Data Mining and Visualization Market Outlook

The global market size for Lifesciences Data Mining and Visualization was valued at approximately USD 1.5 billion in 2023 and is projected to reach around USD 4.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 12.5% during the forecast period. The growth of this market is driven by the increasing demand for sophisticated data analysis tools in the life sciences sector, advancements in analytical technologies, and the rising volume of complex biological data generated from research and clinical trials.

One of the primary growth factors for the Lifesciences Data Mining and Visualization market is the burgeoning amount of data generated from various life sciences applications, such as genomics, proteomics, and clinical trials. With the advent of high-throughput technologies, researchers and healthcare professionals are now capable of generating vast amounts of data, which necessitates the use of advanced data mining and visualization tools to derive actionable insights. These tools not only help in managing and interpreting large datasets but also in uncovering hidden patterns and relationships, thereby accelerating research and development processes.

Another significant driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) algorithms in the life sciences domain. These technologies have proven to be invaluable in enhancing data analysis capabilities, enabling more precise and predictive modeling of biological systems. By integrating AI and ML with data mining and visualization platforms, researchers can achieve higher accuracy in identifying potential drug targets, understanding disease mechanisms, and personalizing treatment plans. This trend is expected to continue, further propelling the market's growth.

Moreover, the rising emphasis on personalized medicine and the need for precision in healthcare is fueling the demand for data mining and visualization tools. Personalized medicine relies heavily on the analysis of individual genetic, proteomic, and metabolomic profiles to tailor treatments specifically to patients' unique characteristics. The ability to visualize these complex datasets in an understandable and actionable manner is critical for the successful implementation of personalized medicine strategies, thereby boosting the demand for advanced data analysis tools.

From a regional perspective, North America is anticipated to dominate the Lifesciences Data Mining and Visualization market, owing to the presence of a robust healthcare infrastructure, significant investments in research and development, and a high adoption rate of advanced technologies. The European market is also expected to witness substantial growth, driven by increasing government initiatives to support life sciences research and the presence of leading biopharmaceutical companies. The Asia Pacific region is projected to experience the fastest growth, attributed to the expanding healthcare sector, rising investments in biotechnology research, and the increasing adoption of data analytics solutions.

Component Analysis

The Lifesciences Data Mining and Visualization market is segmented by component into software and services. The software segment is expected to hold a significant share of the market, driven by the continuous advancements in data mining algorithms and visualization techniques. Software solutions are critical in processing large volumes of complex biological data, facilitating real-time analysis, and providing intuitive visual representations that aid in decision-making. The increasing integration of AI and ML into these software solutions is further enhancing their capabilities, making them indispensable tools in life sciences research.

The services segment, on the other hand, is projected to grow at a considerable rate, as organizations seek specialized expertise to manage and interpret their data. Services include consulting, implementation, and maintenance, as well as training and support. The demand for these services is driven by the need to ensure optimal utilization of data mining software and to keep up with the rapid pace of technological advancements. Moreover, many life sciences organizations lack the in-house expertise required to handle large-scale data analytics projects, thereby turning to external service providers for assistance.

Within the software segment, there is a growing trend towards the development of integrated platforms that combine multiple functionalities, such as data collection, pre
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
o
Data from: Identifying Missing Data Handling Methods with Text Mining
openicpsr.org
delimited
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185961V1
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2016
Description
Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.
S
Predictive data analysis techniques for higher education students dropout
scidb.cn
Updated Apr 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cindy (2023). Predictive data analysis techniques for higher education students dropout [Dataset]. http://doi.org/10.57760/sciencedb.07894
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.07894
Dataset updated
Apr 10, 2023
Dataset provided by
Science Data Bank
Authors
Cindy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this research, we have generated student retention alerts. The alerts are classified into two types: preventive and corrective. This classification varies according to the level of maturity of the data systematization process. Therefore, to systematize the data, data mining techniques have been applied. The experimental analytical method has been used, with a population of 13,715 students with 62 sociological, academic, family, personal, economic, psychological, and institutional variables, and factors such as academic follow-up and performance, financial situation, and personal information. In particular, information is collected on each of the problems or a combination of problems that could affect dropout rates. Following the methodology, the information has been generated through an abstract data model to reflect the profile of the dropout student. As advancement from previous research, this proposal will create preventive and corrective alternatives to avoid dropout higher education. Also, in contrast to previous work, we generated corrective warnings with the application of data mining techniques such as neural networks until reaching a precision of 97% and losses of 0.1052. In conclusion, this study pretends to analyze the behavior of students who drop out the university through the evaluation of predictive patterns. The overall objective is to predict the profile of student dropout, considering reasons such as admission to higher education and career changes. Consequently, using a data systematization process promotes the permanence of students in higher education. Once the profile of the dropout has been identified, student retention strategies have been approached, according to the time of its appearance and the point of view of the institution.
f
FITTING Data Mining Settings for Ranking Seed Lots
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruan Bernardy; Gizele I. Gadotti; Rita de C. M. Monteiro; Karine Von Ahn Pinto; Romário de M. Pinheiro (2023). FITTING Data Mining Settings for Ranking Seed Lots [Dataset]. http://doi.org/10.6084/m9.figshare.22785544.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22785544.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELO journals
Authors
Ruan Bernardy; Gizele I. Gadotti; Rita de C. M. Monteiro; Karine Von Ahn Pinto; Romário de M. Pinheiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT To enhance speed and agility in interpreting physiological quality tests of seeds, The use of algorithms has emerged. This study aimed to identify suitable machine learning models to assist in the precise management of seed lot quality. Soybean lots from two companies were assessed using the Supplied Test Set, Cross-Validation (with 8, 10, and 12 folds), and Percentage Split (with 66% and 70%) methods. Variables analyzed through Tetrazolium tests included vigor, viability, mechanical damage, moisture damage, bed bug damage, and water content. Method performance was determined by Kappa, Precision, and ROC Area metrics. Classification Via Regression and J48 algorithms were employed. The technique utilizing 66% of data for training achieved 93.55% accuracy, with Precision and ROC Area reaching 94.50% for the J48 algorithm. Applying the cross-validation method with 10 folds resulted in 90.22% of correctly classified instances, with a ROC Area outcome like the previous method. Tetrazolium Vigor was the primary attribute used. However, these results are specific to this study's database, and careful planning is necessary to select the most effective application methods.
Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-and-modeling-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining and Modeling Market Outlook

The global data mining and modeling market size was valued at approximately $28.5 billion in 2023 and is projected to reach $70.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.5% during the forecast period. This remarkable growth can be attributed to the increasing complexity and volume of data generated across various industries, necessitating robust tools and techniques for effective data analysis and decision-making processes.

One of the primary growth factors driving the data mining and modeling market is the exponential increase in data generation owing to advancements in digital technology. Modern enterprises generate extensive data from numerous sources such as social media platforms, IoT devices, and transactional databases. The need to make sense of this vast information trove has led to a surge in the adoption of data mining and modeling tools. These tools help organizations uncover hidden patterns, correlations, and insights, thereby enabling more informed decision-making and strategic planning.

Another significant growth driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies. Data mining and modeling are critical components of AI and ML algorithms, which rely on large datasets to learn and make predictions. As businesses strive to stay competitive, they are increasingly investing in AI-driven analytics solutions. This trend is particularly prevalent in sectors such as healthcare, finance, and retail, where predictive analytics can provide a substantial competitive edge. Moreover, advancements in big data technologies are further bolstering the capabilities of data mining and modeling solutions, making them more effective and efficient.

The burgeoning demand for business intelligence (BI) and analytics solutions is also a major factor propelling the market. Organizations are increasingly recognizing the value of data-driven insights in identifying market trends, customer preferences, and operational inefficiencies. Data mining and modeling tools form the backbone of sophisticated BI platforms, enabling companies to transform raw data into actionable intelligence. This demand is further amplified by the growing importance of regulatory compliance and risk management, particularly in highly regulated industries such as banking, financial services, and healthcare.

From a regional perspective, North America currently dominates the data mining and modeling market, owing to the early adoption of advanced technologies and the presence of major market players. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digital transformation initiatives and increasing investments in AI and big data technologies. Europe also holds a significant market share, supported by stringent data protection regulations and a strong focus on innovation.

Component Analysis

The data mining and modeling market by component is broadly segmented into software and services. The software segment encompasses various tools and platforms that facilitate data mining and modeling processes. These software solutions range from basic data analysis tools to advanced platforms integrated with AI and ML capabilities. The increasing complexity of data and the need for real-time analytics are driving the demand for sophisticated software solutions. Companies are investing in custom and off-the-shelf software to enhance their data handling and analytical capabilities, thereby gaining a competitive edge.

The services segment includes consulting, implementation, training, and support services. As organizations strive to leverage data mining and modeling tools effectively, the demand for professional services is on the rise. Consulting services help businesses identify the right tools and strategies for their specific needs, while implementation services ensure the seamless integration of these tools into existing systems. Training services are crucial for building in-house expertise, enabling teams to maximize the benefits of data mining and modeling solutions. Support services ensure the ongoing maintenance and optimization of these tools, addressing any technical issues that may arise.

The software segment is expected to dominate the market throughout the forecast period, driven by continuous advancements in te
f
Toolbox: Mining chemical activity status from high-throughput screening...
figshare.com
application/gzip
Updated Jan 20, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Othman Soufan (2016). Toolbox: Mining chemical activity status from high-throughput screening assays [Dataset]. http://doi.org/10.6084/m9.figshare.1601833.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1601833.v1
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Authors
Othman Soufan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote
Data Mining Tools Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Mining Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-tools-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Apr 1, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Tools Market Outlook 2032

The global data mining tools market size was USD 932 Million in 2023 and is projected to reach USD 2,584.7 Million by 2032, expanding at a CAGR of 12% during 2024–2032. The market is fueled by the rising demand for big data analytics across various industries and the increasing need for AI-integrated data mining tools for insightful decision-making.

Increasing adoption of cloud-based platforms in data mining tools fuels the market. This enhances scalability, flexibility, and cost-efficiency in data handling processes. Major tech companies are launching cloud-based data mining solutions, enabling businesses to analyze vast datasets effectively. This trend reflects the shift toward agile and scalable data analysis methods, meeting the dynamic needs of modern enterprises.

In July 2023, Microsoft launched Power Automate Process Mining. This tool, powered by advanced AI, allows companies to gain deep insights into their operations, streamline processes, and foster ongoing improvement through automation and low-code applications, marking a new era in business efficiency and process optimization.

Rising focus on predictive analytics propels the development of advanced data mining tools capable of forecasting future trends and behaviors. Industries such as finance, healthcare, and retail invest significantly in predictive analytics to gain a competitive edge, driving demand for sophisticated data mining technologies. This trend underscores the strategic importance of foresight in decision-making processes.

Visual data mining tools are gaining traction in the market, offering intuitive data exploration and interpretation capabilities. These tools enable users to uncover patterns and insights through graphical representations, making data analysis accessible to a broader audience. The launch of user-friendly visual data mining applications marks a significant step toward democratizing data analytics.

Impact of Artificial Intelligence (
sequence data
figshare.com
7z
Updated Jul 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jerry Chun-Wei Lin (2017). sequence data [Dataset]. http://doi.org/10.6084/m9.figshare.5165638.v1
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5165638.v1
Dataset updated
Jul 2, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jerry Chun-Wei Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
datasets that we used for mining sequential patterns
D
Data Mining Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-software-41235
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract valuable insights from massive datasets. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching an estimated $45 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting both large enterprises and SMEs. Furthermore, advancements in machine learning and artificial intelligence algorithms are enhancing the accuracy and efficiency of data mining processes, leading to better decision-making across various sectors like finance, healthcare, and marketing. The rise of big data analytics and the increasing availability of affordable, high-powered computing resources are also significant contributors to market growth. However, the market faces certain challenges. Data security and privacy concerns remain paramount, especially with the increasing volume of sensitive information being processed. The complexity of data mining software and the need for skilled professionals to operate and interpret the results present a barrier to entry for some businesses. The high initial investment cost associated with implementing sophisticated data mining solutions can also deter smaller organizations. Nevertheless, the ongoing technological advancements and the growing recognition of the strategic value of data-driven decision-making are expected to overcome these restraints and propel the market toward continued expansion. The market segmentation reveals a strong preference for cloud-based solutions, reflecting the industry's trend toward flexible and scalable IT infrastructure. Large enterprises currently dominate the market share, but SMEs are rapidly adopting data mining software, indicating promising future growth in this segment. Geographic analysis shows that North America and Europe are currently leading the market, but the Asia-Pacific region is poised for significant growth due to increasing digitalization and economic expansion in countries like China and India.

Facebook

Twitter

Click to copy link

Link copied

Cite

Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fpsyg.2018.02231.s001

Dataset updated

Jun 7, 2023

Dataset provided by

Frontiers

Authors

Xin Qiao; Hong Jiao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.

Clear search

Close search

Google apps

Main menu

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Data from: Discovering System Health Anomalies using Data Mining Techniques

Data Mining Tools Market Report

Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games

Ensemble Data Mining Methods - Dataset - NASA Open Data Portal

Data mining methods for knowledge discovery

Ensemble Data Mining Methods

A novel fusion Python application of data mining techniques to evaluate...

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

Online Feature Selection and Its Applications

Lifesciences Data Mining and Visualization Market Report | Global Forecast...

Lifesciences Data Mining and Visualization Market Outlook

Component Analysis

Educational Attainment in North Carolina Public Schools: Use of statistical...

Data from: Identifying Missing Data Handling Methods with Text Mining

Predictive data analysis techniques for higher education students dropout

FITTING Data Mining Settings for Ranking Seed Lots

Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033

Data Mining and Modeling Market Outlook

Component Analysis

Toolbox: Mining chemical activity status from high-throughput screening...

Data Mining Tools Market Report | Global Forecast From 2025 To 2033

Data Mining Tools Market Outlook 2032

Impact of Artificial Intelligence (

sequence data

Data Mining Software Report

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf