100+ datasets found
  1. d

    Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Peer-to-Peer Data Mining, Privacy Issues, and Games [Dataset]. https://catalog.data.gov/dataset/peer-to-peer-data-mining-privacy-issues-and-games
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.

  2. d

    Ensemble Data Mining Methods

    • catalog.data.gov
    • data.wu.ac.at
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Ensemble Data Mining Methods [Dataset]. https://catalog.data.gov/dataset/ensemble-data-mining-methods
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  3. f

    Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

    • frontiersin.figshare.com
    pdf
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Frontiers
    Authors
    Xin Qiao; Hong Jiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.

  4. d

    Data from: Discovering System Health Anomalies using Data Mining Techniques

    • catalog.data.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.

  5. Ensemble Data Mining Methods - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Ensemble Data Mining Methods - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/ensemble-data-mining-methods
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  6. d

    Data Mining in Systems Health Management

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

  7. f

    Data from: Results obtained in a data mining process applied to a database...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    E.M. Ruiz Lobaina; C. P. Romero Suárez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

  8. i

    A novel fusion Python application of data mining techniques to evaluate...

    • ieee-dataport.org
    Updated Jun 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Kayode (2020). A novel fusion Python application of data mining techniques to evaluate airborne magnetic datasets [Dataset]. https://ieee-dataport.org/open-access/novel-fusion-python-application-data-mining-techniques-evaluate-airborne-magnetic
    Explore at:
    Dataset updated
    Jun 8, 2020
    Authors
    John Kayode
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Depths to the various subsurface anomalies have been the primary interest in all the applications of magnetic methods of geophysical prospection. Depths to the subsurface geologic features of interest are more valuable and superior to all other properties in any correct subsurface geologic structural interpretations.

  9. D

    Data Mining Tools Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.

  10. D

    Lifesciences Data Mining and Visualization Market Report | Global Forecast...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Lifesciences Data Mining and Visualization Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-lifesciences-data-mining-and-visualization-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 5, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Lifesciences Data Mining and Visualization Market Outlook



    The global market size for Lifesciences Data Mining and Visualization was valued at approximately USD 1.5 billion in 2023 and is projected to reach around USD 4.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 12.5% during the forecast period. The growth of this market is driven by the increasing demand for sophisticated data analysis tools in the life sciences sector, advancements in analytical technologies, and the rising volume of complex biological data generated from research and clinical trials.



    One of the primary growth factors for the Lifesciences Data Mining and Visualization market is the burgeoning amount of data generated from various life sciences applications, such as genomics, proteomics, and clinical trials. With the advent of high-throughput technologies, researchers and healthcare professionals are now capable of generating vast amounts of data, which necessitates the use of advanced data mining and visualization tools to derive actionable insights. These tools not only help in managing and interpreting large datasets but also in uncovering hidden patterns and relationships, thereby accelerating research and development processes.



    Another significant driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) algorithms in the life sciences domain. These technologies have proven to be invaluable in enhancing data analysis capabilities, enabling more precise and predictive modeling of biological systems. By integrating AI and ML with data mining and visualization platforms, researchers can achieve higher accuracy in identifying potential drug targets, understanding disease mechanisms, and personalizing treatment plans. This trend is expected to continue, further propelling the market's growth.



    Moreover, the rising emphasis on personalized medicine and the need for precision in healthcare is fueling the demand for data mining and visualization tools. Personalized medicine relies heavily on the analysis of individual genetic, proteomic, and metabolomic profiles to tailor treatments specifically to patients' unique characteristics. The ability to visualize these complex datasets in an understandable and actionable manner is critical for the successful implementation of personalized medicine strategies, thereby boosting the demand for advanced data analysis tools.



    From a regional perspective, North America is anticipated to dominate the Lifesciences Data Mining and Visualization market, owing to the presence of a robust healthcare infrastructure, significant investments in research and development, and a high adoption rate of advanced technologies. The European market is also expected to witness substantial growth, driven by increasing government initiatives to support life sciences research and the presence of leading biopharmaceutical companies. The Asia Pacific region is projected to experience the fastest growth, attributed to the expanding healthcare sector, rising investments in biotechnology research, and the increasing adoption of data analytics solutions.



    Component Analysis



    The Lifesciences Data Mining and Visualization market is segmented by component into software and services. The software segment is expected to hold a significant share of the market, driven by the continuous advancements in data mining algorithms and visualization techniques. Software solutions are critical in processing large volumes of complex biological data, facilitating real-time analysis, and providing intuitive visual representations that aid in decision-making. The increasing integration of AI and ML into these software solutions is further enhancing their capabilities, making them indispensable tools in life sciences research.



    The services segment, on the other hand, is projected to grow at a considerable rate, as organizations seek specialized expertise to manage and interpret their data. Services include consulting, implementation, and maintenance, as well as training and support. The demand for these services is driven by the need to ensure optimal utilization of data mining software and to keep up with the rapid pace of technological advancements. Moreover, many life sciences organizations lack the in-house expertise required to handle large-scale data analytics projects, thereby turning to external service providers for assistance.



    Within the software segment, there is a growing trend towards the development of integrated platforms that combine multiple functionalities, such as data collection, pre

  11. Video-to-Model Data Set

    • figshare.com
    • commons.datacite.org
    xml
    Updated Mar 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz (2020). Video-to-Model Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.12026850.v1
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Mar 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.

  12. D

    Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-and-modeling-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Mining and Modeling Market Outlook




    The global data mining and modeling market size was valued at approximately $28.5 billion in 2023 and is projected to reach $70.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.5% during the forecast period. This remarkable growth can be attributed to the increasing complexity and volume of data generated across various industries, necessitating robust tools and techniques for effective data analysis and decision-making processes.




    One of the primary growth factors driving the data mining and modeling market is the exponential increase in data generation owing to advancements in digital technology. Modern enterprises generate extensive data from numerous sources such as social media platforms, IoT devices, and transactional databases. The need to make sense of this vast information trove has led to a surge in the adoption of data mining and modeling tools. These tools help organizations uncover hidden patterns, correlations, and insights, thereby enabling more informed decision-making and strategic planning.




    Another significant growth driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies. Data mining and modeling are critical components of AI and ML algorithms, which rely on large datasets to learn and make predictions. As businesses strive to stay competitive, they are increasingly investing in AI-driven analytics solutions. This trend is particularly prevalent in sectors such as healthcare, finance, and retail, where predictive analytics can provide a substantial competitive edge. Moreover, advancements in big data technologies are further bolstering the capabilities of data mining and modeling solutions, making them more effective and efficient.




    The burgeoning demand for business intelligence (BI) and analytics solutions is also a major factor propelling the market. Organizations are increasingly recognizing the value of data-driven insights in identifying market trends, customer preferences, and operational inefficiencies. Data mining and modeling tools form the backbone of sophisticated BI platforms, enabling companies to transform raw data into actionable intelligence. This demand is further amplified by the growing importance of regulatory compliance and risk management, particularly in highly regulated industries such as banking, financial services, and healthcare.




    From a regional perspective, North America currently dominates the data mining and modeling market, owing to the early adoption of advanced technologies and the presence of major market players. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digital transformation initiatives and increasing investments in AI and big data technologies. Europe also holds a significant market share, supported by stringent data protection regulations and a strong focus on innovation.



    Component Analysis




    The data mining and modeling market by component is broadly segmented into software and services. The software segment encompasses various tools and platforms that facilitate data mining and modeling processes. These software solutions range from basic data analysis tools to advanced platforms integrated with AI and ML capabilities. The increasing complexity of data and the need for real-time analytics are driving the demand for sophisticated software solutions. Companies are investing in custom and off-the-shelf software to enhance their data handling and analytical capabilities, thereby gaining a competitive edge.




    The services segment includes consulting, implementation, training, and support services. As organizations strive to leverage data mining and modeling tools effectively, the demand for professional services is on the rise. Consulting services help businesses identify the right tools and strategies for their specific needs, while implementation services ensure the seamless integration of these tools into existing systems. Training services are crucial for building in-house expertise, enabling teams to maximize the benefits of data mining and modeling solutions. Support services ensure the ongoing maintenance and optimization of these tools, addressing any technical issues that may arise.




    The software segment is expected to dominate the market throughout the forecast period, driven by continuous advancements in te

  13. c

    Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

    • s.cnmilf.com
    • cloud.csiss.gmu.edu
    • +6more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent _domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

  14. o

    Data from: Identifying Missing Data Handling Methods with Text Mining

    • openicpsr.org
    delimited
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 8, 2023
    Dataset provided by
    Hungarian Academy of Sciences
    Authors
    Krisztián Boros; Zoltán Kmetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2016
    Description

    Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

  15. Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

    • zenodo.org
    bin, png, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
    Explore at:
    bin, png, zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Technical notes and documentation on the common data model of the project CONCEPT-DM2.

    This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

    Aims of the CONCEPT-DM2 project:

    General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

    Main specific aims:

    • To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations
    • To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines
    • To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.
    • To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

    Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

    Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

    • Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)
    • Exclusion criteria:
      • patients with no contact with the health system from 01/01/2017 to 31/12/2022
      • patients that had a T1D (DM1) code opened after the T2D code during the follow-up.
    • Study period. From 01/01/2017 to 31/12/2022

    Files included in this publication:

    • Datamodel_CONCEPT_DM2_diagram.png
    • Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)
    • Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)
      • sample_data1_dm_patient.csv
      • sample_data2_dm_param.csv
      • sample_data3_dm_patient.csv
      • sample_data4_dm_param.csv
      • sample_data5_dm_patient.csv
      • sample_data6_dm_param.csv
      • sample_data7_dm_param.csv
      • sample_data8_dm_param.csv
    • Datamodel_CONCEPT_DM2_explanation.pptx
  16. w

    Dataset of books called Data mining techniques in CRM : inside customer...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Data mining techniques in CRM : inside customer segmentation [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Data+mining+techniques+in+CRM+%3A+inside+customer+segmentation
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Data mining techniques in CRM : inside customer segmentation. It features 7 columns including author, publication date, language, and book publisher.

  17. D

    Data Mining Tools Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Data Mining Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-tools-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Mining Tools Market Outlook 2032



    The global data mining tools market size was USD 932 Million in 2023 and is projected to reach USD 2,584.7 Million by 2032, expanding at a CAGR of 12% during 2024–2032. The market is fueled by the rising demand for big data analytics across various industries and the increasing need for AI-integrated data mining tools for insightful decision-making.



    Increasing adoption of cloud-based platforms in data mining tools fuels the market. This enhances scalability, flexibility, and cost-efficiency in data handling processes. Major tech companies are launching cloud-based data mining solutions, enabling businesses to analyze vast datasets effectively. This trend reflects the shift toward agile and scalable data analysis methods, meeting the dynamic needs of modern enterprises.





    • In July 2023, Microsoft launched Power Automate Process Mining. This tool, powered by advanced AI, allows companies to gain deep insights into their operations, streamline processes, and foster ongoing improvement through automation and low-code applications, marking a new era in business efficiency and process optimization.







    Rising focus on predictive analytics propels the development of advanced data mining tools capable of forecasting future trends and behaviors. Industries such as finance, healthcare, and retail invest significantly in predictive analytics to gain a competitive edge, driving demand for sophisticated data mining technologies. This trend underscores the strategic importance of foresight in decision-making processes.



    Visual data mining tools are gaining traction in the market, offering intuitive data exploration and interpretation capabilities. These tools enable users to uncover patterns and insights through graphical representations, making data analysis accessible to a broader audience. The launch of user-friendly visual data mining applications marks a significant step toward democratizing data analytics.



    Impact of Artificial Intelligence (

  18. Data Mining Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Mining Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-mining-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Mining Software Market Outlook



    The global data mining software market size was valued at USD 7.2 billion in 2023 and is projected to reach USD 15.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.7% during the forecast period. This growth is driven primarily by the increasing adoption of big data analytics and the rising demand for business intelligence across various industries. As businesses increasingly recognize the value of data-driven decision-making, the market is expected to witness substantial growth.



    One of the significant growth factors for the data mining software market is the exponential increase in data generation. With the proliferation of internet-enabled devices and the rapid advancement of technologies such as the Internet of Things (IoT), there is a massive influx of data. Organizations are now more focused than ever on harnessing this data to gain insights, improve operations, and create a competitive advantage. This has led to a surge in demand for advanced data mining tools that can process and analyze large datasets efficiently.



    Another driving force is the growing need for personalized customer experiences. In industries such as retail, healthcare, and BFSI, understanding customer behavior and preferences is crucial. Data mining software enables organizations to analyze customer data, segment their audience, and deliver personalized offerings, ultimately enhancing customer satisfaction and loyalty. This drive towards personalization is further fueling the adoption of data mining solutions, contributing significantly to market growth.



    The integration of artificial intelligence (AI) and machine learning (ML) technologies with data mining software is also a key growth factor. These advanced technologies enhance the capabilities of data mining tools by enabling them to learn from data patterns and make more accurate predictions. The convergence of AI and data mining is opening new avenues for businesses, allowing them to automate complex tasks, predict market trends, and make informed decisions more swiftly. The continuous advancements in AI and ML are expected to propel the data mining software market over the forecast period.



    Regionally, North America holds a significant share of the data mining software market, driven by the presence of major technology companies and the early adoption of advanced analytics solutions. The Asia Pacific region is also expected to witness substantial growth due to the rapid digital transformation across various industries and the increasing investments in data infrastructure. Additionally, the growing awareness and implementation of data-driven strategies in emerging economies are contributing to the market expansion in this region.



    Text Mining Software is becoming an integral part of the data mining landscape, offering unique capabilities to analyze unstructured data. As organizations generate vast amounts of textual data from various sources such as social media, emails, and customer feedback, the need for specialized tools to extract meaningful insights is growing. Text Mining Software enables businesses to process and analyze this data, uncovering patterns and trends that were previously hidden. This capability is particularly valuable in industries like marketing, customer service, and research, where understanding the nuances of language can lead to more informed decision-making. The integration of text mining with traditional data mining processes is enhancing the overall analytical capabilities of organizations, allowing them to derive comprehensive insights from both structured and unstructured data.



    Component Analysis



    The data mining software market is segmented by components, which primarily include software and services. The software segment encompasses various types of data mining tools that are used for analyzing and extracting valuable insights from raw data. These tools are designed to handle large volumes of data and provide advanced functionalities such as predictive analytics, data visualization, and pattern recognition. The increasing demand for sophisticated data analysis tools is driving the growth of the software segment. Enterprises are investing in these tools to enhance their data processing capabilities and derive actionable insights.



    Within the software segment, the emergence of cloud-based data mining solutions is a notable trend. Cloud-based solutions offer several advantages, including s

  19. S

    Predictive data analysis techniques for higher education students dropout

    • scidb.cn
    Updated Apr 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cindy (2023). Predictive data analysis techniques for higher education students dropout [Dataset]. http://doi.org/10.57760/sciencedb.07894
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Cindy
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In this research, we have generated student retention alerts. The alerts are classified into two types: preventive and corrective. This classification varies according to the level of maturity of the data systematization process. Therefore, to systematize the data, data mining techniques have been applied. The experimental analytical method has been used, with a population of 13,715 students with 62 sociological, academic, family, personal, economic, psychological, and institutional variables, and factors such as academic follow-up and performance, financial situation, and personal information. In particular, information is collected on each of the problems or a combination of problems that could affect dropout rates. Following the methodology, the information has been generated through an abstract data model to reflect the profile of the dropout student. As advancement from previous research, this proposal will create preventive and corrective alternatives to avoid dropout higher education. Also, in contrast to previous work, we generated corrective warnings with the application of data mining techniques such as neural networks until reaching a precision of 97% and losses of 0.1052. In conclusion, this study pretends to analyze the behavior of students who drop out the university through the evaluation of predictive patterns. The overall objective is to predict the profile of student dropout, considering reasons such as admission to higher education and career changes. Consequently, using a data systematization process promotes the permanence of students in higher education. Once the profile of the dropout has been identified, student retention strategies have been approached, according to the time of its appearance and the point of view of the institution.

  20. Application of data analytics and mining across procurement process globally...

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Application of data analytics and mining across procurement process globally 2017 [Dataset]. https://www.statista.com/statistics/728137/worldwide-application-of-data-analytics-and-mining-across-procurement-process/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    Worldwide
    Description

    This statistic displays the various applications of data analytics and mining across procurement processes, according to chief procurement officers (CPOs) worldwide, as of 2017. Fifty-seven percent of the CPOs asked agreed that data analytics and mining had been applied to intelligent and advanced analytics for negotiations, and ** percent of them indicated data analytics and mining had been applied to supplier portfolio optimization processes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Peer-to-Peer Data Mining, Privacy Issues, and Games [Dataset]. https://catalog.data.gov/dataset/peer-to-peer-data-mining-privacy-issues-and-games

Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games

Related Article
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.

Search
Clear search
Close search
Google apps
Main menu