100+ datasets found

d
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
datadiscoverystudio.org
cloud.csiss.gmu.edu
+6more
Updated Sep 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/ee5b868c86f7498ab5e1473e8d908629/html
Explore at:
Dataset updated
Sep 8, 2014
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
D
Data Mining Tools Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.
c
Data from: Discovering System Health Anomalies using Data Mining Techniques
s.cnmilf.com
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
f
Video-to-Model Data Set
figshare.com
commons.datacite.org
xml
Updated Mar 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz (2020). Video-to-Model Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.12026850.v1
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12026850.v1
Dataset updated
Mar 24, 2020
Dataset provided by
figshare
Authors
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.
ViolenceAgainstWomen.json
figshare.com
json
Updated Sep 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie Mora Andrade (2022). ViolenceAgainstWomen.json [Dataset]. http://doi.org/10.6084/m9.figshare.21252987.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21252987.v1
Dataset updated
Sep 30, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stephanie Mora Andrade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the data used for the development of the investigation. This file was extracted from our mongoDB database. The data set contains real news of violence against women, which were organized with their date, the title and the body of the news.
The utility of the image analytics tool developed within the Orange Data...
figshare.com
zip
Updated Dec 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongsu Park; Maurizio Zuccotti; Silvia Garagna; Riccardo Bellazzi; Uroš Petrovič; Gad Shaulsky; Blaž Zupan (2019). The utility of the image analytics tool developed within the Orange Data Mining suite [Dataset]. http://doi.org/10.6084/m9.figshare.9632276.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9632276.v2
Dataset updated
Dec 16, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Dongsu Park; Maurizio Zuccotti; Silvia Garagna; Riccardo Bellazzi; Uroš Petrovič; Gad Shaulsky; Blaž Zupan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data in this repository comprises four different image sets to showcase the transfer learning approach and illustrate the utility of the image analytics tool developed within the Orange Data Mining suite (http://orange.biolab.si). Each of the four image sets is available in a separate zip file. List of images with additional annotations and information on the image classification is also available in the four Excel files. The images include progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, sub-cellular protein localization in yeast, and developmental morphology of social amoebae.
d
Anomaly Detection with Text Mining
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection with Text Mining [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-with-text-mining
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.
Data Mining Tools Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-tools-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Tools Market Outlook

According to our latest research, the global Data Mining Tools market size reached USD 1.93 billion in 2024, reflecting robust industry momentum. The market is expected to grow at a CAGR of 12.7% from 2025 to 2033, reaching a projected value of USD 5.69 billion by 2033. This growth is primarily driven by the increasing adoption of advanced analytics across diverse industries, rapid digital transformation, and the necessity for actionable insights from massive data volumes.

One of the pivotal growth factors propelling the Data Mining Tools market is the exponential rise in data generation, particularly through digital channels, IoT devices, and enterprise applications. Organizations across sectors are leveraging data mining tools to extract meaningful patterns, trends, and correlations from structured and unstructured data. The need for improved decision-making, operational efficiency, and competitive advantage has made data mining an essential component of modern business strategies. Furthermore, advancements in artificial intelligence and machine learning are enhancing the capabilities of these tools, enabling predictive analytics, anomaly detection, and automation of complex analytical tasks, which further fuels market expansion.

Another significant driver is the growing demand for customer-centric solutions in industries such as retail, BFSI, and healthcare. Data mining tools are increasingly being used for customer relationship management, targeted marketing, fraud detection, and risk management. By analyzing customer behavior and preferences, organizations can personalize their offerings, optimize marketing campaigns, and mitigate risks. The integration of data mining tools with cloud platforms and big data technologies has also simplified deployment and scalability, making these solutions accessible to small and medium-sized enterprises (SMEs) as well as large organizations. This democratization of advanced analytics is creating new growth avenues for vendors and service providers.

The regulatory landscape and the increasing emphasis on data privacy and security are also shaping the development and adoption of Data Mining Tools. Compliance with frameworks such as GDPR, HIPAA, and CCPA necessitates robust data governance and transparent analytics processes. Vendors are responding by incorporating features like data masking, encryption, and audit trails into their solutions, thereby enhancing trust and adoption among regulated industries. Additionally, the emergence of industry-specific data mining applications, such as fraud detection in BFSI and predictive diagnostics in healthcare, is expanding the addressable market and fostering innovation.

From a regional perspective, North America currently dominates the Data Mining Tools market owing to the early adoption of advanced analytics, strong presence of leading technology vendors, and high investments in digital transformation. However, the Asia Pacific region is emerging as a lucrative market, driven by rapid industrialization, expansion of IT infrastructure, and growing awareness of data-driven decision-making in countries like China, India, and Japan. Europe, with its focus on data privacy and digital innovation, also represents a significant market share, while Latin America and the Middle East & Africa are witnessing steady growth as organizations in these regions modernize their operations and adopt cloud-based analytics solutions.

Component Analysis

The Component segment of the Data Mining Tools market is bifurcated into Software and Services. Software remains the dominant segment, accounting for the majority of the market share in 2024. This dominance is attributed to the continuous evolution of data mining algorithms, the proliferation of user-friendly graphical interfaces, and the integration of advanced analytics capabilities such as machine learning, artificial intelligence, and natural language pro
d
Fuzzy Spatiotemporal Data Mining to Activity Recognition in Smart Homes
catalog.data.gov
data.nasa.gov
+1more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Fuzzy Spatiotemporal Data Mining to Activity Recognition in Smart Homes [Dataset]. https://catalog.data.gov/dataset/fuzzy-spatiotemporal-data-mining-to-activity-recognition-in-smart-homes
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
A primary goal to design smart homes is to provide automatic assistance for the residents to make them able to live independently at home. Activity recognition is done to achieve the mentioned goal and then to provide assistance, we would need three sort of information. First, we would need to know the goal of the resident, then the pattern that the resident should obey to achieve its goal and third sort of needed information is the deviations from the previously known patterns. In the presented paper, spatiotemporal aspects of daily activities are surveyed to mine the patterns of activities realized by the smart homes residents. Necessary data to model the spatiotemporal aspects of daily activities is provided by embedded sensors in the smart home. We believe that to accomplish daily activities, specific objects are applied and by analyzing the movement of objects and resident(s), we would obtain valuable information to model the daily activities of the Smart Home’s residents.
m
Indoor Fire Dataset with Distributed Multi-Sensor Nodes (EN54 Test Room)
data.mendeley.com
Updated Jun 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal V (2025). Indoor Fire Dataset with Distributed Multi-Sensor Nodes (EN54 Test Room) [Dataset]. http://doi.org/10.17632/npk2zcm85h.2
Explore at:
Unique identifier
https://doi.org/10.17632/npk2zcm85h.2
Dataset updated
Jun 20, 2025
Authors
Pascal V
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset comprises 4 fire experiments (repeated 3 times) and 3 nuisance experiments (Ethanol: repeated 3 times, Deodorant: repeated 2 times, Hairspray: repeated 1 time), with various background sequences interspersed between the conducted experiments. All exeriments were caried out in random order to reduce the influence of prehistory. It consists of a total of 305,304 rows and 16 columns, structured as a continuous multivariate time series. Each row represents the sensor measurements (CO2, CO, H2, humidity, particulate matter of different sizes, air temperature, and UV) from a unique sensor node position in the EN54 test room at a specific timestamp. The columns correspond to the sensor measurements and include additional labels: a scenario-specific label ("scenario_label"), a binary label ("anomaly_label") distinguishing between "Normal" (background) and "Anomaly" (fire or nuisance scenario), a ternary label ("ternary_label") categorizing the data as "Nuisance," "Fire," or "Background," and a progress label ("progress_label") that allows for dividing the event sequences into sub-sequences based on ongoing physical sub-processes. The dataset comprises 82.98% background data points and 17.02% anomaly data points, which can be further divided into 12.50% fire anomaly data points and 4.52% nuisance anomaly data points. The "Sensor_ID" column can be utilized to access data from different sensor node positions.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
Test datasets for the paper "Unsupervised anomaly detection in hourly water...
figshare.com
txt
Updated Sep 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jieru Yan (2022). Test datasets for the paper "Unsupervised anomaly detection in hourly water demand data using an asymmetric encoder-decoder model" [Dataset]. http://doi.org/10.6084/m9.figshare.20780389.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20780389.v1
Dataset updated
Sep 2, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jieru Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are test datasets for the paper "Unsupervised anomaly detection in hourly water demand data using an asymmetric encoder-decoder model". The names of the datasets are self-explanatory and can be tracked in the paper (Figures 1, 8, 10, 11, and 14).
f
DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning...
frontiersin.figshare.com
pdf
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Li; Jiangwei Li; Chenxu Wang; Fons J. Verbeek; Tanja Schultz; Hui Liu (2023). DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning tree generation with applications on medical data.pdf [Dataset]. http://doi.org/10.3389/fphys.2023.1233341.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fphys.2023.1233341.s001
Dataset updated
Oct 13, 2023
Dataset provided by
Frontiers
Authors
Jia Li; Jiangwei Li; Chenxu Wang; Fons J. Verbeek; Tanja Schultz; Hui Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As an important technique for data pre-processing, outlier detection plays a crucial role in various real applications and has gained substantial attention, especially in medical fields. Despite the importance of outlier detection, many existing methods are vulnerable to the distribution of outliers and require prior knowledge, such as the outlier proportion. To address this problem to some extent, this article proposes an adaptive mini-minimum spanning tree-based outlier detection (MMOD) method, which utilizes a novel distance measure by scaling the Euclidean distance. For datasets containing different densities and taking on different shapes, our method can identify outliers without prior knowledge of outlier percentages. The results on both real-world medical data corpora and intuitive synthetic datasets demonstrate the effectiveness of the proposed method compared to state-of-the-art methods.
f
A pre-trained sound event detection neural network
figshare.com
search.datacite.org
bin
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ian McLoughlin (2023). A pre-trained sound event detection neural network [Dataset]. http://doi.org/10.6084/m9.figshare.5245789.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245789.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Authors
Ian McLoughlin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a network trained earlier on clean data. It does not give very good results but is enough to show the system working. It can achieve above 90% on clean sounds and probably about 80% accuracy in 0dB SNR.The network was saved manually (using the MATLAB 'save' command) after running the training code, and before running the testing code.
m
Source Code
bridges.monash.edu
researchdata.edu.au
zip
Updated Oct 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Wei Tan (2017). Source Code [Dataset]. http://doi.org/10.4225/03/59e33dfb920f1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4225/03/59e33dfb920f1
Dataset updated
Oct 15, 2017
Dataset provided by
Monash University
Authors
Chang Wei Tan
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This is the source code for the paper "Efficient search of the best warping window for Dynamic Time Warping".This work focused on fast learning/searching for the best warping window for Dynamic Time Warping and Time Series Classification.For more info, visit https://github.com/ChangWeiTan/FastWWSearch
D
Data Mining Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Data Mining Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-mining-tools-556785
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 14, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for data mining tools is experiencing robust growth, projected to reach $882.8 million in 2025. While the provided CAGR is missing, considering the rapid advancements in artificial intelligence, machine learning, and big data analytics, a conservative estimate of the Compound Annual Growth Rate (CAGR) for the forecast period (2025-2033) would be around 15%. This signifies a significant expansion of the market, driven by the increasing need for businesses to extract valuable insights from massive datasets for improved decision-making, enhanced operational efficiency, and competitive advantage. Key drivers include the rising adoption of cloud-based data mining solutions, the proliferation of big data, and growing investments in advanced analytics capabilities across various sectors like healthcare, finance, and retail. Furthermore, the continuous development of sophisticated algorithms and user-friendly interfaces is making data mining accessible to a wider range of users, fueling market growth. The market is highly competitive, with established players like IBM, SAS Institute, Oracle, and Microsoft alongside emerging innovative companies like H2O.ai and Dataiku vying for market share. The segmentation of the market is diverse, encompassing various deployment models (cloud, on-premise), application types (predictive modeling, customer segmentation, fraud detection), and industry verticals. While restraints such as the high cost of implementation and the need for specialized skills can hinder wider adoption, the overall market outlook remains positive. The predicted CAGR of 15% suggests the market will likely exceed $3 billion by 2033, driven by continued technological innovation, increasing data volumes, and the growing recognition of data mining's crucial role in achieving business success in an increasingly data-driven world.
A
Anomaly Detection Technology Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Anomaly Detection Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/anomaly-detection-technology-55077
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Anomaly Detection Technology market is experiencing robust growth, projected to reach a market size of $6,650.9 million in 2025. While the CAGR isn't explicitly provided, considering the rapid advancements in AI, machine learning, and the increasing need for cybersecurity and predictive maintenance across diverse sectors, a conservative estimate of the CAGR for the forecast period (2025-2033) would be around 15-20%. This growth is fueled by several key drivers. The increasing volume and complexity of data generated by businesses necessitates advanced analytics for identifying unusual patterns indicative of fraud, security breaches, equipment malfunctions, or other critical events. Furthermore, the rising adoption of cloud computing and the expanding deployment of IoT devices are contributing significantly to market expansion. The BFSI, manufacturing, and healthcare sectors are leading adopters, leveraging anomaly detection to improve risk management, optimize operational efficiency, and enhance customer experience. However, challenges remain, including the complexity of implementing and integrating anomaly detection solutions, the need for specialized expertise, and concerns related to data privacy and security. The market is segmented by type (Big Data Analytics, Data Mining and Business Intelligence, Machine Learning and Artificial Intelligence, Others) and application (BFSI, Manufacturing, Retail, Healthcare, Government, IT & Telecom, Others), reflecting the diverse applications of this technology. The competitive landscape is characterized by a mix of established technology giants like IBM, Microsoft, and Cisco, alongside specialized anomaly detection vendors. The future of the Anomaly Detection Technology market is bright, with continued growth driven by technological innovation and increasing adoption across various industries. The development of more sophisticated algorithms, improved data visualization tools, and the integration of anomaly detection into existing business processes will further fuel market expansion. The focus on addressing challenges related to data privacy and security, coupled with the emergence of specialized solutions catering to specific industry needs, will shape the market's trajectory in the coming years. While economic fluctuations and competitive pressures might influence growth rates, the fundamental need for advanced anomaly detection capabilities across multiple sectors guarantees the market's long-term viability and potential for substantial growth.
r
A predictive model for opal exploration in Australia from a data mining...
researchdata.edu.au
Updated May 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller (2015). A predictive model for opal exploration in Australia from a data mining approach [Dataset]. http://doi.org/10.4227/11/5587A86C0FDF1
Explore at:
Unique identifier
https://doi.org/10.4227/11/5587A86C0FDF1
Dataset updated
May 1, 2015
Dataset provided by
The University of Sydney
Authors
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Dataset funded by
Australian Research Council
Description
This data collection is associated with the publications: Merdith, A. S., Landgrebe, T. C. W., Dutkiewicz, A., & Müller, R. D. (2013). Towards a predictive model for opal exploration using a spatio-temporal data mining approach. Australian Journal of Earth Sciences, 60(2), 217-229. doi: 10.1080/08120099.2012.754793
and
Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002
Publication Abstract - Merdith et al. (2013)
Opal is Australia's national gemstone, however most significant opal discoveries were made in the early 1900's - more than 100 years ago - until recently. Currently there is no formal exploration model for opal, meaning there are no widely accepted concepts or methodologies available to suggest where new opal fields may be found. As a consequence opal mining in Australia is a cottage industry with the majority of opal exploration focused around old opal fields. The EarthByte Group has developed a new opal exploration methodology for the Great Artesian Basin. The work is based on the concept of applying “big data mining” approaches to data sets relevant for identifying regions that are prospective for opal. The group combined a multitude of geological and geophysical data sets that were jointly analysed to establish associations between particular features in the data with known opal mining sites. A “training set” of known opal localities (1036 opal mines) was assembled, using those localities, which were featured in published reports and on maps. The data used include rock types, soil type, regolith type, topography, radiometric data and a stack of digital palaeogeographic maps. The different data layers were analysed via spatio-temporal data mining combining the GPlates PaleoGIS software (www.gplates.org) with the Orange data mining software (orange.biolab.si) to produce the first opal prospectivity map for the Great Artesian Basin. One of the main results of the study is that the geological conditions favourable for opal were found to be related to a particular sequence of surface environments over geological time. These conditions involved alternating shallow seas and river systems followed by uplift and erosion. The approach reduces the entire area of the Great Artesian Basin to a mere 6% that is deemed to be prospective for opal exploration. The work is described in two companion papers in the Australian Journal of Earth Sciences and Computers and Geosciences.
Publication Abstract - Landgrebe et al. (2013)
Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
Authors and Institutions
Andrew Merdith - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-7564-8149
Thomas Landgrebe - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
Adriana Dutkiewicz - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
R. Dietmar Müller - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-3334-5764
Overview of Resources Contained
This collection contains geological data from Australia used for data mining in the publications Merdith et al. (2013) and Landgrebe et al. (2013). The resulting maps of opal prospectivity are also included.
List of Resources
Note: For details on the files included in this data collection, see “Description_of_Resources.txt”.
Note: For information on file formats and what programs to use to interact with various file formats, see “File_Formats_and_Recommended_Programs.txt”.
Map of Barfield region, Australia (.jpg, 270 KB)
Map overviewing the Great Artesian basins and main opal mining camps (.png, 82 KB)
Maps showing opal prospectivity data mining results for different geological datasets (.tif, 23.1 MB)
Map of opal prospectivity from palaeogeography data mining (.pdf, 2.6 MB)
Raster of palaeogeography target regions for viewing in Google Earth (.jpg, 418 KB)
Opal mine locations (.gpml, .txt, .kmz, .shp, total 15.6 MB)
Map of opal prospectivity from all data mining results as a Google Earth overlay (.kmz, 12 KB)
Map of probability of opal occurrence in prospective regions from all data mining results (.tif, 5.9 MB)
Paleogeography of Australia (.gpml, .txt, .shp, total 114.2 MB)
Radiometric data showing potassium concentration contrasts (.tif, .kmz, total 311.3 MB)
Regolith data (.gpml, .txt, .kml, .shp, total 7.1 MB)
Soil type data (.gpml, .txt, .kml, .shp, total 7.1 MB)
For more information on this data collection, and links to other datasets from the EarthByte Research Group please visit EarthByte
For more information about using GPlates, including tutorials and a user manual please visit GPlates or EarthByte
D
Data Mining Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-software-41235
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract valuable insights from massive datasets. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching an estimated $45 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting both large enterprises and SMEs. Furthermore, advancements in machine learning and artificial intelligence algorithms are enhancing the accuracy and efficiency of data mining processes, leading to better decision-making across various sectors like finance, healthcare, and marketing. The rise of big data analytics and the increasing availability of affordable, high-powered computing resources are also significant contributors to market growth. However, the market faces certain challenges. Data security and privacy concerns remain paramount, especially with the increasing volume of sensitive information being processed. The complexity of data mining software and the need for skilled professionals to operate and interpret the results present a barrier to entry for some businesses. The high initial investment cost associated with implementing sophisticated data mining solutions can also deter smaller organizations. Nevertheless, the ongoing technological advancements and the growing recognition of the strategic value of data-driven decision-making are expected to overcome these restraints and propel the market toward continued expansion. The market segmentation reveals a strong preference for cloud-based solutions, reflecting the industry's trend toward flexible and scalable IT infrastructure. Large enterprises currently dominate the market share, but SMEs are rapidly adopting data mining software, indicating promising future growth in this segment. Geographic analysis shows that North America and Europe are currently leading the market, but the Asia-Pacific region is poised for significant growth due to increasing digitalization and economic expansion in countries like China and India.
m
pinterest_dataset
data.mendeley.com
Updated Oct 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Carlos Gomez (2017). pinterest_dataset [Dataset]. http://doi.org/10.17632/fs4k2zc5j5.2
Explore at:
Unique identifier
https://doi.org/10.17632/fs4k2zc5j5.2
Dataset updated
Oct 27, 2017
Authors
Juan Carlos Gomez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset with 72000 pins from 117 users in Pinterest. Each pin contains a short raw text and an image. The images are processed using a pretrained Convolutional Neural Network and transformed into a vector of 4096 features.

This dataset was used in the paper "User Identification in Pinterest Through the Refinement of a Cascade Fusion of Text and Images" to idenfity specific users given their comments. The paper is publishe in the Research in Computing Science Journal, as part of the LKE 2017 conference. The dataset includes the splits used in the paper.

There are nine files. text_test, text_train and text_val, contain the raw text of each pin in the corresponding split of the data. imag_test, imag_train and imag_val contain the image features of each pin in the corresponding split of the data. train_user and val_test_users contain the index of the user of each pin (between 0 and 116). There is a correspondance one-to-one among the test, train and validation files for images, text and users. There are 400 pins per user in the train set, and 100 pins per user in the validation and test sets each one.

If you have questions regarding the data, write to: jc dot gomez at ugto dot mx

Facebook

Twitter

Click to copy link

Link copied

Cite

(2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/ee5b868c86f7498ab5e1473e8d908629/html

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms

Explore at:

Dataset updated

Sep 8, 2014

Description

The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

Clear search

Close search

Google apps

Main menu

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

Data Mining Tools Market Report

Data from: Discovering System Health Anomalies using Data Mining Techniques

Video-to-Model Data Set

ViolenceAgainstWomen.json

The utility of the image analytics tool developed within the Orange Data...

Anomaly Detection with Text Mining

Data Mining Tools Market Research Report 2033

Data Mining Tools Market Outlook

Component Analysis

Fuzzy Spatiotemporal Data Mining to Activity Recognition in Smart Homes

Indoor Fire Dataset with Distributed Multi-Sensor Nodes (EN54 Test Room)

Online Feature Selection and Its Applications

Test datasets for the paper "Unsupervised anomaly detection in hourly water...

DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning...

A pre-trained sound event detection neural network

Source Code

Data Mining Tools Report

Anomaly Detection Technology Report

A predictive model for opal exploration in Australia from a data mining...

and

Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002

Publication Abstract - Merdith et al. (2013)

Publication Abstract - Landgrebe et al. (2013)

Authors and Institutions

Overview of Resources Contained

List of Resources

Data Mining Software Report

pinterest_dataset

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining AlgorithmsSee More Versions

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms