100+ datasets found
  1. G

    Data Mining Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-tools-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Mining Tools Market Outlook




    According to our latest research, the global Data Mining Tools market size reached USD 1.93 billion in 2024, reflecting robust industry momentum. The market is expected to grow at a CAGR of 12.7% from 2025 to 2033, reaching a projected value of USD 5.69 billion by 2033. This growth is primarily driven by the increasing adoption of advanced analytics across diverse industries, rapid digital transformation, and the necessity for actionable insights from massive data volumes.




    One of the pivotal growth factors propelling the Data Mining Tools market is the exponential rise in data generation, particularly through digital channels, IoT devices, and enterprise applications. Organizations across sectors are leveraging data mining tools to extract meaningful patterns, trends, and correlations from structured and unstructured data. The need for improved decision-making, operational efficiency, and competitive advantage has made data mining an essential component of modern business strategies. Furthermore, advancements in artificial intelligence and machine learning are enhancing the capabilities of these tools, enabling predictive analytics, anomaly detection, and automation of complex analytical tasks, which further fuels market expansion.




    Another significant driver is the growing demand for customer-centric solutions in industries such as retail, BFSI, and healthcare. Data mining tools are increasingly being used for customer relationship management, targeted marketing, fraud detection, and risk management. By analyzing customer behavior and preferences, organizations can personalize their offerings, optimize marketing campaigns, and mitigate risks. The integration of data mining tools with cloud platforms and big data technologies has also simplified deployment and scalability, making these solutions accessible to small and medium-sized enterprises (SMEs) as well as large organizations. This democratization of advanced analytics is creating new growth avenues for vendors and service providers.




    The regulatory landscape and the increasing emphasis on data privacy and security are also shaping the development and adoption of Data Mining Tools. Compliance with frameworks such as GDPR, HIPAA, and CCPA necessitates robust data governance and transparent analytics processes. Vendors are responding by incorporating features like data masking, encryption, and audit trails into their solutions, thereby enhancing trust and adoption among regulated industries. Additionally, the emergence of industry-specific data mining applications, such as fraud detection in BFSI and predictive diagnostics in healthcare, is expanding the addressable market and fostering innovation.




    From a regional perspective, North America currently dominates the Data Mining Tools market owing to the early adoption of advanced analytics, strong presence of leading technology vendors, and high investments in digital transformation. However, the Asia Pacific region is emerging as a lucrative market, driven by rapid industrialization, expansion of IT infrastructure, and growing awareness of data-driven decision-making in countries like China, India, and Japan. Europe, with its focus on data privacy and digital innovation, also represents a significant market share, while Latin America and the Middle East & Africa are witnessing steady growth as organizations in these regions modernize their operations and adopt cloud-based analytics solutions.





    Component Analysis




    The Component segment of the Data Mining Tools market is bifurcated into Software and Services. Software remains the dominant segment, accounting for the majority of the market share in 2024. This dominance is attributed to the continuous evolution of data mining algorithms, the proliferation of user-friendly graphical interfaces, and the integration of advanced analytics capabilities such as machine learning, artificial intelligence, and natural language pro

  2. d

    Data Mining in Systems Health Management

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

  3. Video-to-Model Data Set

    • figshare.com
    • commons.datacite.org
    xml
    Updated Mar 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz (2020). Video-to-Model Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.12026850.v1
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Mar 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.

  4. Data from: Results obtained in a data mining process applied to a database...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    E.M. Ruiz Lobaina; C. P. Romero Suárez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

  5. d

    Distributed Data Mining in Peer-to-Peer Networks

    • catalog.data.gov
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.

  6. Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.

  7. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  8. Data Mining Tools Market Size, Share, Growth, Forecast, By Component...

    • verifiedmarketresearch.com
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2025). Data Mining Tools Market Size, Share, Growth, Forecast, By Component (Software, Services), By Deployment Mode (On-Premise, Cloud-Based), By Function (Data Cleaning, Data Integration, Data Transformation, Data Visualization), By Application (Marketing, Fraud Detection & Risk Management, Cybersecurity, Customer Relationship Management (CRM)) [Dataset]. https://www.verifiedmarketresearch.com/product/data-mining-tools-market/
    Explore at:
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Data Mining Tools Market size was valued at USD 915.42 Million in 2024 and is projected to reach USD 2171.21 Million by 2032, growing at a CAGR of 11.40% from 2026 to 2032.• Big Data Explosion: Exponential growth in data generation from IoT devices, social media, mobile applications, and digital transactions is creating massive datasets requiring advanced mining tools for analysis. Organizations need sophisticated solutions to extract meaningful insights from structured and unstructured data sources for competitive advantage.• Digital Transformation Initiatives: Accelerating digital transformation across industries is driving demand for data mining tools that enable data-driven decision making and business intelligence. Companies are investing in analytics capabilities to optimize operations, improve customer experiences, and develop new revenue streams through data monetization strategies.

  9. Data Mining Project - Boston

    • kaggle.com
    zip
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
    Explore at:
    zip(59313797 bytes)Available download formats
    Dataset updated
    Nov 25, 2019
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  10. d

    Data from: Multi-objective optimization based privacy preserving distributed...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks [Dataset]. https://catalog.data.gov/dataset/multi-objective-optimization-based-privacy-preserving-distributed-data-mining-in-peer-to-p
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Dashlink
    Description

    This paper proposes a scalable, local privacy preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.

  11. G

    Data Mining Software Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Mining Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-software-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Mining Software Market Outlook



    According to our latest research, the global Data Mining Software market size in 2024 stands at USD 12.7 billion. This market is experiencing robust expansion, driven by the growing demand for actionable insights across industries, and is expected to reach USD 38.1 billion by 2033, registering a remarkable CAGR of 13.1% during the forecast period. The proliferation of big data, increasing adoption of artificial intelligence, and the need for advanced analytics are the primary growth factors propelling the market forward.




    The accelerating digitization across sectors is a key factor fueling the growth of the Data Mining Software market. Organizations are generating and collecting vast amounts of data at unprecedented rates, requiring sophisticated tools to extract meaningful patterns and actionable intelligence. The rise of Internet of Things (IoT) devices, social media platforms, and connected infrastructure has further intensified the need for robust data mining solutions. Businesses are leveraging data mining software to enhance decision-making, optimize operations, and gain a competitive edge. The integration of machine learning and artificial intelligence algorithms into data mining tools is enabling organizations to automate complex analytical tasks, uncover hidden trends, and predict future outcomes with greater accuracy. As enterprises continue to recognize the value of data-driven strategies, the demand for advanced data mining software is poised for sustained growth.




    Another significant factor contributing to the market’s expansion is the increasing regulatory pressure on data management and security. Regulatory frameworks such as GDPR, HIPAA, and CCPA are compelling organizations to adopt comprehensive data management practices, which include advanced data mining software for compliance monitoring and risk assessment. These regulations are driving investments in software that can efficiently process, analyze, and secure large data sets while ensuring transparency and accountability. Additionally, the surge in cyber threats and data breaches has heightened the importance of robust analytics solutions for anomaly detection, fraud prevention, and real-time threat intelligence. As a result, sectors such as BFSI, healthcare, and government are prioritizing the deployment of data mining solutions to safeguard sensitive information and maintain regulatory compliance.




    The growing emphasis on customer-centric strategies is also playing a pivotal role in the expansion of the Data Mining Software market. Organizations across retail, telecommunications, and financial services are utilizing data mining tools to personalize customer experiences, enhance marketing campaigns, and improve customer retention rates. By analyzing customer behavior, preferences, and feedback, businesses can tailor their offerings and communication strategies to meet evolving consumer demands. The ability to derive granular insights from vast customer data sets enables companies to innovate rapidly and stay ahead of market trends. Furthermore, the integration of data mining with customer relationship management (CRM) and enterprise resource planning (ERP) systems is streamlining business processes and fostering a culture of data-driven decision-making.




    From a regional perspective, North America currently dominates the Data Mining Software market, supported by a mature technological infrastructure, high adoption of cloud-based analytics, and a strong presence of leading software vendors. Europe follows closely, driven by stringent data privacy regulations and increasing investments in digital transformation initiatives. The Asia Pacific region is emerging as a high-growth market, fueled by rapid industrialization, expanding IT sectors, and the proliferation of digital services across economies such as China, India, and Japan. Latin America and the Middle East & Africa are also witnessing increasing adoption, particularly in sectors like banking, telecommunications, and government, as organizations seek to harness the power of data for strategic growth.





    <

  12. Google Scholar Article Listing(Data Mining)

    • kaggle.com
    zip
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anas Mahmood (2023). Google Scholar Article Listing(Data Mining) [Dataset]. https://www.kaggle.com/muhammadanasmahmood/google-scholar-article-listingdata-mining
    Explore at:
    zip(155055 bytes)Available download formats
    Dataset updated
    Apr 21, 2023
    Authors
    Muhammad Anas Mahmood
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset includes google scholar articles listing on data mining, this is very helpful in many educational research works. This dataset contains 936 unique entries. including title, description, author names, article link, cited by and related articles.

  13. d

    Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

    • catalog.data.gov
    • s.cnmilf.com
    • +3more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

  14. r

    Data from: Scaling data mining in massively parallel dataflow systems

    • resodate.org
    Updated Feb 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Schelter (2016). Scaling data mining in massively parallel dataflow systems [Dataset]. http://doi.org/10.14279/depositonce-4982
    Explore at:
    Dataset updated
    Feb 5, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Sebastian Schelter
    Description

    This thesis lays the ground work for enabling scalable data mining in massively parallel dataflow systems, using large datasets. Such datasets have become ubiquitous. We illustrate common fallacies with respect to scalable data mining: It is in no way sufficient to naively implement textbook algorithms on parallel systems; bottlenecks on all layers of the stack prevent the scalability of such naive implementations. We argue that scalability in data mining is a multi-leveled problem and must therefore be approached on the interplay of algorithms, systems, and applications. We therefore discuss a selection of scalability problems on these different levels. We investigate algorithm-specific scalability aspects of collaborative filtering algorithms for computing recommendations, a popular data mining use case with many industry deployments. We show how to efficiently execute the two most common approaches, namely neighborhood methods and latent factor models on MapReduce, and describe a specialized architecture for scaling collaborative filtering to extremely large datasets which we implemented at Twitter. We turn to system-specific scalability aspects, where we improve system performance during the distributed execution of a special class of iterative algorithms by drastically reducing the overhead required for guaranteeing fault tolerance. Therefore we propose a novel optimistic approach to fault-tolerance which exploits the robust convergence properties of a large class of fixpoint algorithms and does not incur measurable overhead in failure-free cases. Finally, we present work on an application-specific scalability aspect of scalable data mining. A common problem when deploying machine learning applications in real-world scenarios is that the prediction quality of ML models heavily depends on hyperparameters that have to be chosen in advance. We propose an algorithmic framework for an important subproblem occuring during hyperparameter search at scale: efficiently generating samples from block-partitioned matrices in a shared-nothing environment. For every selected problem, we show how to execute the resulting computation automatically in a parallel and scalable manner, and evaluate our proposed solution on large datasets with billions of datapoints.

  15. d

    Data from: Mining Distance-Based Outliers in Near Linear Time

    • catalog.data.gov
    • datasets.ai
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://catalog.data.gov/dataset/mining-distance-based-outliers-in-near-linear-time
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  16. d

    Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry...

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davis Energy (2023). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. https://catalog.data.gov/dataset/community-scale-attic-retrofit-and-home-energy-upgrade-data-mining-hot-dry-climate
    Explore at:
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    Davis Energy
    Description

    Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking. The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs. ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis. This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.

  17. Data Mining in Systems Health Management - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Data Mining in Systems Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-in-systems-health-management
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

  18. r

    International Journal of Engineering and Advanced Technology FAQ -...

    • researchhelpdesk.org
    Updated May 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/552/international-journal-of-engineering-and-advanced-technology
    Explore at:
    Dataset updated
    May 28, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level agreements (drafting,

  19. e

    Data Mining Tools Market Size, Share, Trend Analysis by 2033

    • emergenresearch.com
    pdf,excel,csv,ppt
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emergen Research (2024). Data Mining Tools Market Size, Share, Trend Analysis by 2033 [Dataset]. https://www.emergenresearch.com/industry-report/data-mining-tools-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset authored and provided by
    Emergen Research
    License

    https://www.emergenresearch.com/privacy-policyhttps://www.emergenresearch.com/privacy-policy

    Area covered
    Global
    Variables measured
    Base Year, No. of Pages, Growth Drivers, Forecast Period, Segments covered, Historical Data for, Pitfalls Challenges, 2033 Value Projection, Tables, Charts, and Figures, Forecast Period 2024 - 2033 CAGR, and 1 more
    Description

    The Data Mining Tools Market size is expected to reach a valuation of USD 3.33 billion in 2033 growing at a CAGR of 12.50%. The Data Mining Tools market research report classifies market by share, trend, demand, forecast and based on segmentation.

  20. r

    Data from: Time in market: using data mining technologies to measure product...

    • resodate.org
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Poppe (2022). Time in market: using data mining technologies to measure product lifecycles [Dataset]. http://doi.org/10.14279/depositonce-16226
    Explore at:
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Erik Poppe
    Description

    Time in Market (TIM) is a metric to describe the time period of a product from its market entry to its decline and disappearance from the market. The concept is often used implicit to describe the acceleration of product life cycles, innovation cycles and is an essential part of the product life cycle concept. It can be assumed that time in markets is an important indicator for manufacturers and marketers to plan and evaluate their market success. Moreover, time in markets are necessary to measure the speed of product life cycles and their implication for the general development of product lifetime. This article’s major contributions are to presenting (1) time in markets as a highly relevant concept for the assessment of product life cycles, although the indicator has received little attention so far, (2) explaining an automated internet-based data mining approach to gather semi-structured product data from 5 German internet shops for electronic consumer goods and (3) presenting initial insights for a period of a half to one year on market data for smartphones. It will turn out that longer periods of time are needed to obtain significant data on time in markets, nevertheless initial results show a high product rollover rate of 40-45% within one year and present a time in market below 100 days for at least 16% of the captured products. Due to the current state of work, this article is addressed to researchers already engaged in data mining or interested in the application of it.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Growth Market Reports (2025). Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-tools-market

Data Mining Tools Market Research Report 2033

Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description

Data Mining Tools Market Outlook




According to our latest research, the global Data Mining Tools market size reached USD 1.93 billion in 2024, reflecting robust industry momentum. The market is expected to grow at a CAGR of 12.7% from 2025 to 2033, reaching a projected value of USD 5.69 billion by 2033. This growth is primarily driven by the increasing adoption of advanced analytics across diverse industries, rapid digital transformation, and the necessity for actionable insights from massive data volumes.




One of the pivotal growth factors propelling the Data Mining Tools market is the exponential rise in data generation, particularly through digital channels, IoT devices, and enterprise applications. Organizations across sectors are leveraging data mining tools to extract meaningful patterns, trends, and correlations from structured and unstructured data. The need for improved decision-making, operational efficiency, and competitive advantage has made data mining an essential component of modern business strategies. Furthermore, advancements in artificial intelligence and machine learning are enhancing the capabilities of these tools, enabling predictive analytics, anomaly detection, and automation of complex analytical tasks, which further fuels market expansion.




Another significant driver is the growing demand for customer-centric solutions in industries such as retail, BFSI, and healthcare. Data mining tools are increasingly being used for customer relationship management, targeted marketing, fraud detection, and risk management. By analyzing customer behavior and preferences, organizations can personalize their offerings, optimize marketing campaigns, and mitigate risks. The integration of data mining tools with cloud platforms and big data technologies has also simplified deployment and scalability, making these solutions accessible to small and medium-sized enterprises (SMEs) as well as large organizations. This democratization of advanced analytics is creating new growth avenues for vendors and service providers.




The regulatory landscape and the increasing emphasis on data privacy and security are also shaping the development and adoption of Data Mining Tools. Compliance with frameworks such as GDPR, HIPAA, and CCPA necessitates robust data governance and transparent analytics processes. Vendors are responding by incorporating features like data masking, encryption, and audit trails into their solutions, thereby enhancing trust and adoption among regulated industries. Additionally, the emergence of industry-specific data mining applications, such as fraud detection in BFSI and predictive diagnostics in healthcare, is expanding the addressable market and fostering innovation.




From a regional perspective, North America currently dominates the Data Mining Tools market owing to the early adoption of advanced analytics, strong presence of leading technology vendors, and high investments in digital transformation. However, the Asia Pacific region is emerging as a lucrative market, driven by rapid industrialization, expansion of IT infrastructure, and growing awareness of data-driven decision-making in countries like China, India, and Japan. Europe, with its focus on data privacy and digital innovation, also represents a significant market share, while Latin America and the Middle East & Africa are witnessing steady growth as organizations in these regions modernize their operations and adopt cloud-based analytics solutions.





Component Analysis




The Component segment of the Data Mining Tools market is bifurcated into Software and Services. Software remains the dominant segment, accounting for the majority of the market share in 2024. This dominance is attributed to the continuous evolution of data mining algorithms, the proliferation of user-friendly graphical interfaces, and the integration of advanced analytics capabilities such as machine learning, artificial intelligence, and natural language pro

Search
Clear search
Close search
Google apps
Main menu