100+ datasets found

w
Dataset of book subjects that contain Learning Data Mining with Python
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Learning Data Mining with Python [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Learning+Data+Mining+with+Python&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 2 rows and is filtered where the books is Learning Data Mining with Python. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
North Carolina
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
e
Data from: Supervised Learning for classification
paper.erudition.co.in
html
Updated Aug 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Supervised Learning for classification [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Supervised Learning for classification of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
e
Overview of data mining and predictive analytics
paper.erudition.co.in
html
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Overview of data mining and predictive analytics [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Overview of data mining and predictive analytics of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
Data from: Classification of word levels with usage frequency, expert...
zenodo.org
explore.openaire.eu
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Onur Guzey; Gihad Sohsah; Muhammed Unal; Onur Guzey; Gihad Sohsah; Muhammed Unal (2020). Classification of word levels with usage frequency, expert opinions and machine learning [Dataset]. http://doi.org/10.5281/zenodo.12501
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12501
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Onur Guzey; Gihad Sohsah; Muhammed Unal; Onur Guzey; Gihad Sohsah; Muhammed Unal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes classification of English words according to CEFR language levels. It can be used in various educational applications including determining levels of text that is appropriate for students learning English.

For each word, part-of-speech, the word lemma and usage frequency is provided. For words that have no survey results, a machine learning based methodology is used to predict levels. These predictions are also included as a separate file. This data is released as part of the submission process to British Journal of Educational Technology Special Issue on Open Data.

The included readme.pdf file contains a detailed description of data.
MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters...
springernature.figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexios Koutsoukas; Keith Monaghan; Xiaoli Li; Jun Huan (2023). MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data [Dataset]. http://doi.org/10.6084/m9.figshare.c.3814018_D4.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3814018_D4.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alexios Koutsoukas; Keith Monaghan; Xiaoli Li; Jun Huan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4. Results of parameter selection.
D
Data Mining Tools Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.
Video-to-Model Data Set
figshare.com
commons.datacite.org
xml
Updated Mar 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz (2020). Video-to-Model Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.12026850.v1
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12026850.v1
Dataset updated
Mar 24, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.
i
A novel fusion Python application of data mining techniques to evaluate...
ieee-dataport.org
Updated Jun 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Kayode (2020). A novel fusion Python application of data mining techniques to evaluate airborne magnetic datasets [Dataset]. https://ieee-dataport.org/open-access/novel-fusion-python-application-data-mining-techniques-evaluate-airborne-magnetic
Explore at:
Dataset updated
Jun 8, 2020
Authors
John Kayode
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Depths to the various subsurface anomalies have been the primary interest in all the applications of magnetic methods of geophysical prospection. Depths to the subsurface geologic features of interest are more valuable and superior to all other properties in any correct subsurface geologic structural interpretations.
D
Data Mining Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Mining Software Report [Dataset]. https://www.datainsightsmarket.com/reports/data-mining-software-1423491
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jun 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract actionable insights from massive datasets. The market's expansion is fueled by several key factors: the proliferation of big data, advancements in machine learning algorithms, and the growing adoption of cloud-based data analytics solutions. Businesses across various sectors, including finance, healthcare, and retail, are leveraging data mining software to improve operational efficiency, enhance customer experience, and gain a competitive edge. The market is segmented by software type (e.g., predictive analytics, text mining, etc.), deployment model (cloud, on-premise), and industry vertical. While the competitive landscape is crowded with both established players like SAS and IBM, and emerging niche providers, the market is expected to consolidate somewhat as larger companies acquire smaller, specialized firms. This consolidation will likely lead to more integrated and comprehensive data mining solutions. The projected Compound Annual Growth Rate (CAGR) suggests a significant increase in market size over the forecast period (2025-2033). While precise figures are unavailable, assuming a conservative CAGR of 15% and a 2025 market size of $5 billion (a reasonable estimate given the size and growth of related markets), we can project substantial growth. Challenges remain, however, including the need for skilled data scientists to manage and interpret the results, as well as concerns about data security and privacy. Addressing these challenges will be crucial for continued market expansion. The increasing availability of open-source tools also presents a challenge to established vendors, demanding innovation and competitive pricing strategies.
d
Ensemble Data Mining Methods
catalog.data.gov
cloud.csiss.gmu.edu
+1more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Ensemble Data Mining Methods [Dataset]. https://catalog.data.gov/dataset/ensemble-data-mining-methods
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
e
Unsupervised learning
paper.erudition.co.in
html
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Unsupervised learning [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Unsupervised learning of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
T
Text and Data Mining (TDM) Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Text and Data Mining (TDM) Report [Dataset]. https://www.datainsightsmarket.com/reports/text-and-data-mining-tdm-1403420
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jul 27, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Text and Data Mining (TDM) market is experiencing robust growth, driven by the increasing volume of unstructured data generated across various sectors and the need for businesses to extract actionable insights. The market's expansion is fueled by advancements in Natural Language Processing (NLP), machine learning algorithms, and cloud computing capabilities, which enable efficient processing and analysis of large datasets. Organizations across diverse industries, including finance, healthcare, and marketing, are adopting TDM solutions to improve customer experience, enhance operational efficiency, and gain a competitive edge through data-driven decision-making. The rising demand for sentiment analysis, topic modeling, and text summarization is further contributing to the market's expansion. While data privacy and security concerns represent a significant restraint, the development of robust and compliant solutions is mitigating these risks, fostering market growth. Considering a plausible CAGR of 15% and a 2025 market size of $5 billion (a reasonable estimation based on industry reports showing similar growth in related AI segments), we can project substantial growth throughout the forecast period. Leading players like IBM, SAS, and others are continuously innovating and expanding their offerings, solidifying their market positions. The competitive landscape is characterized by a mix of established vendors and emerging players offering diverse solutions ranging from enterprise-level platforms to specialized APIs. The market is witnessing a shift towards cloud-based TDM solutions due to their scalability, cost-effectiveness, and ease of deployment. Furthermore, the increasing adoption of open-source tools like Apache Mahout is making TDM technology more accessible to a wider range of users. However, challenges remain in terms of data standardization, integrating TDM with existing business processes, and ensuring the accuracy and reliability of insights generated. Future growth will be significantly influenced by the adoption of advanced analytics techniques such as deep learning and the development of more sophisticated NLP models capable of understanding context and nuances in human language. Continued investment in research and development, coupled with industry collaborations, will be crucial in overcoming these challenges and unlocking the full potential of TDM.
D
Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-and-modeling-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining and Modeling Market Outlook

The global data mining and modeling market size was valued at approximately $28.5 billion in 2023 and is projected to reach $70.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.5% during the forecast period. This remarkable growth can be attributed to the increasing complexity and volume of data generated across various industries, necessitating robust tools and techniques for effective data analysis and decision-making processes.

One of the primary growth factors driving the data mining and modeling market is the exponential increase in data generation owing to advancements in digital technology. Modern enterprises generate extensive data from numerous sources such as social media platforms, IoT devices, and transactional databases. The need to make sense of this vast information trove has led to a surge in the adoption of data mining and modeling tools. These tools help organizations uncover hidden patterns, correlations, and insights, thereby enabling more informed decision-making and strategic planning.

Another significant growth driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies. Data mining and modeling are critical components of AI and ML algorithms, which rely on large datasets to learn and make predictions. As businesses strive to stay competitive, they are increasingly investing in AI-driven analytics solutions. This trend is particularly prevalent in sectors such as healthcare, finance, and retail, where predictive analytics can provide a substantial competitive edge. Moreover, advancements in big data technologies are further bolstering the capabilities of data mining and modeling solutions, making them more effective and efficient.

The burgeoning demand for business intelligence (BI) and analytics solutions is also a major factor propelling the market. Organizations are increasingly recognizing the value of data-driven insights in identifying market trends, customer preferences, and operational inefficiencies. Data mining and modeling tools form the backbone of sophisticated BI platforms, enabling companies to transform raw data into actionable intelligence. This demand is further amplified by the growing importance of regulatory compliance and risk management, particularly in highly regulated industries such as banking, financial services, and healthcare.

From a regional perspective, North America currently dominates the data mining and modeling market, owing to the early adoption of advanced technologies and the presence of major market players. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digital transformation initiatives and increasing investments in AI and big data technologies. Europe also holds a significant market share, supported by stringent data protection regulations and a strong focus on innovation.

Component Analysis

The data mining and modeling market by component is broadly segmented into software and services. The software segment encompasses various tools and platforms that facilitate data mining and modeling processes. These software solutions range from basic data analysis tools to advanced platforms integrated with AI and ML capabilities. The increasing complexity of data and the need for real-time analytics are driving the demand for sophisticated software solutions. Companies are investing in custom and off-the-shelf software to enhance their data handling and analytical capabilities, thereby gaining a competitive edge.

The services segment includes consulting, implementation, training, and support services. As organizations strive to leverage data mining and modeling tools effectively, the demand for professional services is on the rise. Consulting services help businesses identify the right tools and strategies for their specific needs, while implementation services ensure the seamless integration of these tools into existing systems. Training services are crucial for building in-house expertise, enabling teams to maximize the benefits of data mining and modeling solutions. Support services ensure the ongoing maintenance and optimization of these tools, addressing any technical issues that may arise.

The software segment is expected to dominate the market throughout the forecast period, driven by continuous advancements in te
m
pinterest_dataset
data.mendeley.com
Updated Oct 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Carlos Gomez (2017). pinterest_dataset [Dataset]. http://doi.org/10.17632/fs4k2zc5j5.2
Explore at:
Unique identifier
https://doi.org/10.17632/fs4k2zc5j5.2
Dataset updated
Oct 27, 2017
Authors
Juan Carlos Gomez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset with 72000 pins from 117 users in Pinterest. Each pin contains a short raw text and an image. The images are processed using a pretrained Convolutional Neural Network and transformed into a vector of 4096 features.

This dataset was used in the paper "User Identification in Pinterest Through the Refinement of a Cascade Fusion of Text and Images" to idenfity specific users given their comments. The paper is publishe in the Research in Computing Science Journal, as part of the LKE 2017 conference. The dataset includes the splits used in the paper.

There are nine files. text_test, text_train and text_val, contain the raw text of each pin in the corresponding split of the data. imag_test, imag_train and imag_val contain the image features of each pin in the corresponding split of the data. train_user and val_test_users contain the index of the user of each pin (between 0 and 116). There is a correspondance one-to-one among the test, train and validation files for images, text and users. There are 400 pins per user in the train set, and 100 pins per user in the validation and test sets each one.

If you have questions regarding the data, write to: jc dot gomez at ugto dot mx
e
Data from: Validation strategies
paper.erudition.co.in
html
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Validation strategies [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Validation strategies of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
Additional file 1 of Educational data mining: prediction of students'...
springernature.figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Yağcı (2023). Additional file 1 of Educational data mining: prediction of students' academic performance using machine learning algorithms [Dataset]. http://doi.org/10.6084/m9.figshare.19304996.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19304996.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Mustafa Yağcı
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Dataset.
s
Data and source code for "Automating Intention Mining"
researchdata.smu.edu.sg
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY (2023). Data and source code for "Automating Intention Mining" [Dataset]. http://doi.org/10.25440/smu.21261408.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.21261408.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
The dataset and source code for paper "Automating Intention Mining".

The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.

By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.

Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.

Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
m
Data for: A Realistic and Public Dataset with Rare Undesirable Real Events...
data.mendeley.com
Updated Jul 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Vargas (2019). Data for: A Realistic and Public Dataset with Rare Undesirable Real Events in Oil Wells [Dataset]. http://doi.org/10.17632/r7774rwc7v.1
Explore at:
Unique identifier
https://doi.org/10.17632/r7774rwc7v.1
Dataset updated
Jul 15, 2019
Authors
Ricardo Vargas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the first realistic and public dataset with rare undesirable real events in oil wells as far as the authors of this work know. It can be used in development of several kinds of techniques and methods for different tasks associated with undesirable events in oil and gas wells.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Dataset of book subjects that contain Learning Data Mining with Python [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Learning+Data+Mining+with+Python&j=1&j0=books

Dataset of book subjects that contain Learning Data Mining with Python

Explore at:

Dataset updated

Nov 7, 2024

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book subjects. It has 2 rows and is filtered where the books is Learning Data Mining with Python. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Clear search

Close search

Google apps

Main menu

Dataset of book subjects that contain Learning Data Mining with Python

Educational Attainment in North Carolina Public Schools: Use of statistical...

Online Feature Selection and Its Applications

Data from: Supervised Learning for classification

Overview of data mining and predictive analytics

Data from: Classification of word levels with usage frequency, expert...

MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters...

Data Mining Tools Market Report

Video-to-Model Data Set

A novel fusion Python application of data mining techniques to evaluate...

Data Mining Software Report

Ensemble Data Mining Methods

Unsupervised learning

Text and Data Mining (TDM) Report

Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033

Data Mining and Modeling Market Outlook

Component Analysis

pinterest_dataset

Data from: Validation strategies

Additional file 1 of Educational data mining: prediction of students'...

Data and source code for "Automating Intention Mining"

Data for: A Realistic and Public Dataset with Rare Undesirable Real Events...

Dataset of book subjects that contain Learning Data Mining with Python