100+ datasets found

z
Controlled Anomalies Time Series (CATS) Dataset
zenodo.org
explore.openaire.eu
bin
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7646897
Dataset updated
Jul 12, 2024
Dataset provided by
Solenix Engineering GmbH
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
v
Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly...
verifiedmarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verified Market Research, Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly Detection, Machine Learning Anomaly Detection), By Application (Network Security, Fraud Detection, Risk Management), By Industry Vertical (Banking, Financial Services, And Insurance (BFSI), Retail And E-commerce, Healthcare), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/anomaly-detection-solution-market/
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Verified Market Research
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Global Anomaly Detection Solution Market size was valued at USD 6.18 Billion in 2024 and is projected to reach USD 19.99 Billion by 2032, growing at a CAGR of 15.80% from 2026 to 2032.Global Anomaly Detection Solution Market DynamicsThe key market dynamics that are shaping the global Anomaly Detection Solution Market include:Key Market Drivers:Increasing Cybersecurity Threats: The surge in sophisticated cyberattacks and data breaches is a key driver of the Anomaly Detection Solution Market. Cybercriminals are increasingly targeting organizations with innovative tactics for breaching security systems. Anomaly detection solutions are critical for detecting unexpected patterns or behaviors that could indicate a threat such as unauthorized access or insider threats.Growing Volume of Data: The exponential rise of data generated by businesses, fueled by digital transformation and IoT devices, needs excellent anomaly detection.
P
Domain-independent anomalies datasets Dataset
paperswithcode.com
Updated Jul 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bühler; Jonas Fehrenbach; Lucas Steinmann; Christian Nauck; Marios Koulakis (2024). Domain-independent anomalies datasets Dataset [Dataset]. https://paperswithcode.com/dataset/domain-independent-anomalies-datasets
Explore at:
Dataset updated
Jul 2, 2024
Authors
Jonas Bühler; Jonas Fehrenbach; Lucas Steinmann; Christian Nauck; Marios Koulakis
Description
An adaption of the MVTec Anomaly Detection dataset, presented in the paper "Domain-independent detection of known anomalies".

There are three different datasets, each covering one specific anomaly type: color, cut and hole. The datasets can be used to evaluate approaches on the hybrid task of detecting known anomalies across different, previously unseen objects: All object types except one are used for training. During testing, the images of the remaining object type should be classified on whether they contain an anomaly.

Note: the authors are not affiliated with MVTec
d
Comparative Analysis of Data-Driven Anomaly Detection Methods
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Comparative Analysis of Data-Driven Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparative-analysis-of-data-driven-anomaly-detection-methods
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.
d
Anomaly Detection in Sequences
catalog.data.gov
datasets.ai
+3more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection in Sequences [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-sequences
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
BigDataAD Benchmark Dataset
figshare.com
zip
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Pattinson (2023). BigDataAD Benchmark Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24040563.v8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24040563.v8
Dataset updated
Sep 29, 2023
Dataset provided by
figshare
Authors
Kingsley Pattinson
License
https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
Description
The largest real-world dataset for multivariate time series anomaly detection (MTSAD) from the AIOps system of a Real-Time Data Warehouse (RTDW) from a top cloud computing company. All the metrics and labels in our dataset are derived from real-world scenarios. All metrics were obtained from the RTDW instance monitoring system and cover a rich variety of metric types, including CPU usage, queries per second (QPS) and latency, which are related to many important modules within RTDW AIOps Dataset. We obtain labels from the ticket system, which integrates three main sources of instance anomalies: user service requests, instance unavailability and fault simulations . User service requests refer to tickets that are submitted directly by users, whereas instance unavailability is typically detected through existing monitoring tools or discovered by Site Reliability Engineers (SREs). Since the system is usually very stable, we augment the anomaly samples by conducting fault simulations. Fault simulation refers to a special type of anomaly, planned beforehand, which is introduced to the system to test its performance under extreme conditions. All records in the ticket system are subject to follow-up processing by engineers, who meticulously mark the start and end times of each ticket. This rigorous approach ensures the accuracy of the labels in our dataset.
Z
DCASE 2024 Challenge Task 2 Additional Training Dataset
data.niaid.nih.gov
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takashi, Endo (2024). DCASE 2024 Challenge Task 2 Additional Training Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11183283
Explore at:
Dataset updated
May 15, 2024
Dataset provided by
Daisuke, Niizumi
Takashi, Endo
Kota, Dohi
Albertini, Davide
Noboru, Harada
Sannino, Roberto
Yohei, Kawaguchi
Keisuke, Imoto
Tomoya, Nishida
Pradolini, Simone
Harsh, Purohit
Augusti, Filippo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.

The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:

3DPrinter

AirCompressor

BrushlessMotor

HairDryer

HoveringDrone

RoboticArm

Scanner

ToothBrush

ToyCircuit

Overview of the task

Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.

Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.

Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.

5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.

The last requirement is newly introduced in DCASE 2024 Task2.

Definition

We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

"Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.

A section is defined as a subset of the dataset for calculating performance metrics.

The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.

Dataset

This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

File names and attribute csv files

File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

Recording procedure

Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

Directory structure

/eval_data

/raw - /3DPrinter - /train (only normal clips) - /section_00_source_train_normal_0001_.wav - ... - /section_00_source_train_normal_0990_.wav - /section_00_target_train_normal_0001_.wav - ... - /section_00_target_train_normal_0010_.wav - attributes_00.csv (attribute csv for section 00) - /AirCompressor (The other machine types have the same directory structure as 3DPrinter.) - /BrushlessMotor - /HairDryer - /HoveringDrone - /RoboticArm - /Scanner - /ToothBrush - /ToyCircuit

Baseline system

The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Condition of use

This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

Contact

If there is any problem, please contact us:

Tomoya Nishida, tomoya.nishida.ax@hitachi.com

Keisuke Imoto, keisuke.imoto@ieee.org

Noboru Harada, noboru@ieee.org

Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
d
Comparison of Unsupervised Anomaly Detection Methods
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.
o
Medical Out-of-Distribution Analysis Challenge 2022
explore.openaire.eu
Updated Mar 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Zimmerer; Jens Petersen; Gregor K��hler; Paul J��ger; Peter Full; Klaus Maier-Hein; Tobias Ro��; Tim Adler; Annika Reinke; Lena Maier-Hein (2022). Medical Out-of-Distribution Analysis Challenge 2022 [Dataset]. http://doi.org/10.5281/zenodo.6362313
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6362313
Dataset updated
Mar 16, 2022
Authors
David Zimmerer; Jens Petersen; Gregor K��hler; Paul J��ger; Peter Full; Klaus Maier-Hein; Tobias Ro��; Tim Adler; Annika Reinke; Lena Maier-Hein
Description
Despite overwhelming successes in recent years, progress in the field of biomedical image computing still largely depends on the availability of annotated training examples. This annotation process is often prohibitively expensive because it requires the valuable time of domain experts. Additionally, this approach simply does not scale well: whenever a new imaging modality is created, acquisition parameters change. Even something as basic as the target demographic is prone to changes, and new annotated cases have to be created to allow methods to cope with the resulting images. Image labeling is thus bound to become the major bottleneck in the coming years. Furthermore, it has been shown that many algorithms used in image analysis are vulnerable to out-of-distribution samples, resulting in wrong and overconfident decisions [20, 21, 22, 23]. In addition, physicians can overlook unexpected conditions in medical images, often termed ��inattentional blindness��. In [1], Drew et al. noted that 50% of trained radiologists did not notice a gorilla image, rendered into a lung CT scan when assessing lung nodules. One approach, which does not require labeled images and can generalize to unseen pathological conditions, is Out-of-Distribution or anomaly detection (which in this context is used interchangeably). Anomaly detection can recognize and outline conditions that have not been previously encountered during training and thus circumvents the time-consuming labeling process and can therefore quickly be adapted to new modalities. Additionally, by highlighting such abnormal regions, anomaly detection can guide the physicians�� attention to otherwise overlooked abnormalities in a scan and potentially improve the time required to inspect medical images. However, while there is a lot of recent research on improving anomaly detection [8, 9, 10, 11, 12, 13, 14, 15, 16, 17], especially with a focus on the medical field [4, 5, 6, 7], a common dataset/ benchmark to compare different approaches is missing. Thus, it is currently hard to have a fair comparison of different proposed approaches. While in the last few months common datasets for natural data were proposed, such as default detection [3] or abnormal traffic scene detection [2], we tried to tackle this issue for medical imaging with last year's challenge [25]. In a similar setting to the last years we suggest the medical out-of-distribution challenge as a standardized dataset and benchmark for anomaly detection. We propose two different tasks. First a sample-wise (i.e. patients-wise) analysis, thus detecting out-of-distribution samples. For example, having a pathological condition or any other condition not seen in the training-set. This can pose a problem to classically supervised algorithms and detection of such could further allow physicians to prioritize different patients. Secondly, we propose a voxel-wise analysis i.e. giving a score for each voxel, highlighting abnormal conditions and potentially guiding the physician. However, there are a few aspects to consider when choosing an anomaly detection dataset. First, as in reality, the types of anomalies should not be known beforehand. This can be a particular problem when choosing a dataset and testing on only a single pathological condition, which is vulnerable to exploitation. Even with an educated guess (based on the dataset) and a fully supervised segmentation approach, trained on a not allowed separate dataset, one could outperform other rightfully trained anomaly detection approaches. Furthermore, making the exact types of anomalies known can cause a bias in the evaluation. Studies have shown that proposed anomaly detection algorithms tend to overfit on a given task, given that properties of the test set and the kind of anomalies are known beforehand. This further hinders the comparability of different algorithms [6, 18, 19, 23]. As a second point, combining test sets, from different sources with alternative conditions, may also cause problems. By definition, the different sources already propose a distribution shift to the training dataset, complicating a clean and meaningful evaluation. To solve these issues we propose to provide two datasets with more than 600 scans each, one brain MRI-dataset and one abdominal CT-dataset, to allow for a comparison of the generalizability of the approaches. In order to prevent overfitting on the (types of) anomalies existing in our test set, the test set will be kept confidential at all times. The training set consists of hand-selected scans in which no anomalies were identified. The remaining scans will be assigned to the test set. Thus some scans in the test set do not contain anomalies, whilst others contain naturally occurring anomalies. In addition to the natural anomalies, we will add synthetic anomalies. We choose different structured types of synthetic anomalies (e.g. a tumor or an image of a gorilla rendered into the a brain scan [1]) to cover a broad var...
DCASE 2025 Challenge Task 2 Evaluation Dataset
zenodo.org
zip
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoya Nishida; Tomoya Nishida; Noboru Harada; Noboru Harada; Daisuke Niizumi; Daisuke Niizumi; Davide Albertini; Roberto Sannino; Simone Pradolini; Filippo Augusti; Keisuke Imoto; Keisuke Imoto; Kota Dohi; Kota Dohi; Harsh Purohit; Takashi Endo; Yohei Kawaguchi; Yohei Kawaguchi; Davide Albertini; Roberto Sannino; Simone Pradolini; Filippo Augusti; Harsh Purohit; Takashi Endo (2025). DCASE 2025 Challenge Task 2 Evaluation Dataset [Dataset]. http://doi.org/10.5281/zenodo.15519362
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15519362
Dataset updated
Jun 1, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tomoya Nishida; Tomoya Nishida; Noboru Harada; Noboru Harada; Daisuke Niizumi; Daisuke Niizumi; Davide Albertini; Roberto Sannino; Simone Pradolini; Filippo Augusti; Keisuke Imoto; Keisuke Imoto; Kota Dohi; Kota Dohi; Harsh Purohit; Takashi Endo; Yohei Kawaguchi; Yohei Kawaguchi; Davide Albertini; Roberto Sannino; Simone Pradolini; Filippo Augusti; Harsh Purohit; Takashi Endo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Jun 1, 2025
Description
Description

This dataset is the "evaluation dataset" for the DCASE 2025 Challenge Task 2.

The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-sec or 12-sec audio that includes both a machine's operating sound and environmental noise. The following eight types of real/toy machines are used in this task:

AutoTrash

HomeCamera

ToyPet

ToyRCCar

BandSealer

Polisher

ScrewFeeder

CoffeeGrinder

Overview of the task

Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

This task is the follow-up from DCASE 2020 Task 2 to DCASE 2024 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

1. Train a model using only normal sound (unsupervised learning scenario)
Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data, which is called UASD (unsupervised ASD). This is the same requirement as in the previous tasks.
2. Detect anomalies regardless of domain shifts (domain generalization task)
In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same since DCASE 2022 Task 2.
3. Train a model for a completely new machine type
For a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same since DCASE 2023 Task 2.
4. Train a model both with or without attribute information
While additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.
5. Train a model with additional clean machine data or noise-only data (optional)
Although the primary training data consists of machine sounds recorded under noisy conditions, in some situations it may be possible to collect clean machine data when the factory is idle or gather noise recordings when the machine itself is not running. Participants are free to incorporate these additional data sources to enhance the accuracy of their models.

The last optional requirement is newly introduced in DCASE 2025 Task2.

Definition

We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

"Machine type" indicates the type of machine, which in the additional training dataset is one of eight: auto trash, home camera, Toy pet, Toy RC car, band sealer, polisher, screw feeder.

A section is defined as a subset of the dataset for calculating performance metrics.

The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.

Dataset

This dataset consists of eight machine types. For each machine type, one section is provided, and the section is a complete set of test data. A set of training data corresponding to this test data is provided in another seperate zenodo page as an "additional training dataset" for the DCASE 2025 Challenge task 2 (DCASE 2025 Challenge Task 2 Additional Training Dataset). For each section, this dataset provides 200 clips of test data.

File names and attribute csv files

File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

Recording procedure

Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

Directory structure

- /eval_data

- /raw
- /AutoTrash
- /test
- /section_00_0001.wav
- ...
- /section_00_0200.wav
- /HomeCamera
- /ToyPet
- /ToyRCCar
- /BandSealer
- /Polisher
- /ScrewFeeder
- /CoffeeGrinder

Baseline system

The baseline system is available on the Github repository https://github.com/nttcslab/dcase2023_task2_baseline_ae. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Condition of use

This dataset was created jointly by Hitachi, Ltd., NTT Corporation, and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

Contact

If there is any problem, please contact us:

Tomoya Nishida, tomoya.nishida.ax@hitachi.com

Keisuke Imoto, keisuke.imoto@ieee.org

Noboru Harada, noboru@ieee.org

Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
Z
ComplexVAD Video Anomaly Detection Dataset
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cherian, Anoop (2024). ComplexVAD Video Anomaly Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11475280
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Mumcu, Furkan
Jones, Mike
Yilmaz, Yasin
Cherian, Anoop
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Introduction

The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.

The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.

Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.

At a Glance

The size of the unzipped dataset is ~39GB

The dataset consists of Train sequences (containing only videos with normal activity), Test sequences (containing some anomalous activity), a ground truth annotation file for each Test sequence, and a README.md file describing the data organization and ground truth annotation format.

The zip files contain a Train directory, a Test directory, an annotations directory, and a README.md file.

License

The ComplexVAD dataset is released under CC-BY-SA-4.0 license.

All data:

Created by Mitsubishi Electric Research Laboratories (MERL), 2024

SPDX-License-Identifier: CC-BY-SA-4.0
Data from: Anomaly Detection in a Fleet of Systems
data.nasa.gov
datasets.ai
+2more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://data.nasa.gov/dataset/anomaly-detection-in-a-fleet-of-systems
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.
A
Anomaly Detection Industry Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Anomaly Detection Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/anomaly-detection-industry-14721
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 4, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The anomaly detection market is experiencing robust growth, fueled by the increasing volume and complexity of data generated across various industries. A compound annual growth rate (CAGR) of 16.22% from 2019 to 2024 suggests a significant market expansion, driven by the imperative for businesses to enhance cybersecurity, improve operational efficiency, and gain valuable insights from their data. Key drivers include the rising adoption of cloud computing, the proliferation of IoT devices generating massive datasets, and the growing need for real-time fraud detection and prevention, particularly within the BFSI (Banking, Financial Services, and Insurance) sector. The market is segmented by solution type (software, services), end-user industry (BFSI, manufacturing, healthcare, IT and telecommunications, others), and deployment (on-premise, cloud). The cloud deployment segment is anticipated to witness faster growth due to its scalability, cost-effectiveness, and ease of implementation. The increasing sophistication of cyberattacks and the need for proactive security measures are further bolstering demand for advanced anomaly detection solutions. While data privacy concerns and the complexity of integrating these solutions into existing IT infrastructure represent potential restraints, the overall market trajectory indicates a sustained period of expansion. Companies like SAS Institute, IBM, and Microsoft are actively shaping this market with their comprehensive offerings. The significant growth trajectory is expected to continue through 2033. The substantial investments in research and development by major players and the growing adoption across diverse sectors, including healthcare for predictive maintenance and anomaly detection in medical imaging, will continue to fuel the expansion. The competitive landscape is characterized by both established players offering comprehensive solutions and emerging niche players focusing on specific industry needs. This competitive dynamism fosters innovation and drives the development of more efficient and sophisticated anomaly detection technologies. While regional variations exist, North America and Europe currently hold a significant market share, with Asia-Pacific poised for rapid expansion due to increasing digitalization and investment in advanced technologies. This report provides a detailed analysis of the global anomaly detection market, projecting robust growth from $XXX million in 2025 to $YYY million by 2033. The study covers the historical period (2019-2024), base year (2025), and forecast period (2025-2033), offering invaluable insights for businesses navigating this rapidly evolving landscape. Keywords: Anomaly detection, machine learning, AI, cybersecurity, fraud detection, predictive analytics, data mining, big data analytics, real-time analytics. Recent developments include: June 2023: Wipro has launched a new suite of banking financial services built on Microsoft Cloud; the partnership will combine Microsoft Cloud capabilities with Wipro FullStride Cloud and leverage Wipro's and Capco's deep domain expertise in financial services. And develop new solutions to help financial services clients accelerate growth and deepen client relationships., June 2023: Cisco has announced delivering on its promise of the AI-driven Cisco Security Cloud to simplify cybersecurity and empower people to do their best work from anywhere, regardless of the increasingly sophisticated threat landscape. Cisco invests in cutting-edge artificial intelligence and machine learning innovations that will empower security teams by simplifying operations and increasing efficacy.. Key drivers for this market are: Increasing Number of Cyber Crimes, Increasing Adoption of Anomaly Detection Solutions in Software Testing. Potential restraints include: Open Source Alternatives Pose as a Threat. Notable trends are: BFSI is Expected to Hold a Significant Part of the Market Share.
i
Anomaly detection dataset
ieee-dataport.org
Updated Nov 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prarthi Jain (2020). Anomaly detection dataset [Dataset]. https://ieee-dataport.org/open-access/anomaly-detection-dataset
Explore at:
Dataset updated
Nov 14, 2020
Authors
Prarthi Jain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please refer each dataset website for further information
A
Anomaly Detection Technology Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Anomaly Detection Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/anomaly-detection-technology-13023
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jan 21, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Paragraph 1: The market for Anomaly Detection Technology is projected to reach a valuation of 7290 million by 2033, expanding at a steady 4.8% CAGR over the forecast period. This growth is primarily driven by the increasing adoption of advanced technologies such as Big Data Analytics, Machine Learning, and Artificial Intelligence in various industries. Anomaly Detection Technology enables organizations to identify and analyze deviations from normal patterns within their data, providing valuable insights for fraud detection, security threat monitoring, and operational efficiency. Paragraph 2: The market is segmented based on Type, Application, and Region. In terms of Type, Machine Learning and Artificial Intelligence account for the largest share, followed by Big Data Analytics. By Application, BFSI (Banking, Financial Services, and Insurance) holds the dominant position due to the critical importance of fraud detection and security. However, the Healthcare and IT & Telecom sectors are expected to experience significant growth in the coming years. Geographically, North America is the largest market, followed by Asia Pacific. The increasing adoption of cloud-based Anomaly Detection solutions and the growing awareness of cybersecurity threats are contributing to the overall market growth. This report offers a comprehensive analysis of the global anomaly detection technology market, providing insights into its current state and future prospects. The report covers market segmentation by type, application, and region, along with detailed analysis of industry trends, drivers, challenges, and growth catalysts. Key market players are profiled, and significant developments in the sector are highlighted.
A
OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis...
data.amerigeoss.org
data.wu.ac.at
html
Updated Jul 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States[old] (2019). OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis Portal [Dataset]. https://data.amerigeoss.org/pl/dataset/0f24d562-556c-4895-955a-74fec4cc9993
Explore at:
htmlAvailable download formats
Dataset updated
Jul 25, 2019
Dataset provided by
United States[old]
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Anomaly detection is a process of identifying items, events or observations, which do not conform to an expected pattern in a dataset or time series. Current and future missions and our research communities challenge us to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data intensive reality, we propose to develop an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of ocean science datasets. A parallel analytics engine will be developed as the key computational and data-mining core of OceanXtreams' backend processing. This analytic engine will demonstrate three new technology ideas to provide rapid turn around on climatology computation and anomaly detection: 1. An adaption of the Hadoop/MapReduce framework for parallel data mining of science datasets, typically large 3 or 4 dimensional arrays packaged in NetCDF and HDF. 2. An algorithm profiling service to efficiently and cost-effectively scale up hybrid Cloud computing resources based on the needs of scheduled jobs (CPU, memory, network, and bursting from a private Cloud computing cluster to public cloud provider like Amazon Cloud services). 3. An extension to industry-standard search solutions (OpenSearch and Faceted search) to provide support for shared discovery and exploration of ocean phenomena and anomalies, along with unexpected correlations between key measured variables. We will use a hybrid Cloud compute cluster (private Eucalyptus on-premise at JPL with bursting to Amazon Web Services) as the operational backend. The key idea is that the parallel data-mining operations will be run 'near' the ocean data archives (a local 'network' hop) so that we can efficiently access the thousands of (say, daily) files making up a three decade time-series, and then cache key variables and pre-computed climatologies in a high-performance parallel database. OceanXtremes will be equipped with both web portal and web service interfaces for users and applications/systems to register and retrieve oceanographic anomalies data. By leveraging technology such as Datacasting (Bingham, et.al, 2007), users can also subscribe to anomaly or 'event' types of their interest and have newly computed anomaly metrics and other information delivered to them by metadata feeds packaged in standard Rich Site Summary (RSS) format. Upon receiving new feed entries, users can examine the metrics and download relevant variables, by simply clicking on a link, to begin further analyzing the event. The OceanXtremes web portal will allow users to define their own anomaly or feature types where continuous backend processing will be scheduled to populate the new user-defined anomaly type by executing the chosen data mining algorithm (i.e. differences from climatology or gradients above a specified threshold). Metadata on the identified anomalies will be cataloged including temporal and geospatial profiles, key physical metrics, related observational artifacts and other relevant metadata to facilitate discovery, extraction, and visualization. Products created by the anomaly detection algorithm will be made explorable and subsettable using Webification (Huang, et.al, 2014) and OPeNDAP (http://opendap.org) technologies. Using this platform scientists can efficiently search for anomalies or ocean phenomena, compute data metrics for events or over time-series of ocean variables, and efficiently find and access all of the data relevant to their study (and then download only that data).
v
Global Anomaly Detection Service Industry Insights: Market Size, Growth...
verifiedindustryinsights.com
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verified Industry Insights (2025). Global Anomaly Detection Service Industry Insights: Market Size, Growth Analysis and Forecast [Dataset]. https://www.verifiedindustryinsights.com/report/global-anomaly-detection-service-industry/
Explore at:
Dataset updated
Jun 15, 2025
Authors
Verified Industry Insights
License
https://www.verifiedindustryinsights.com/privacy-policyhttps://www.verifiedindustryinsights.com/privacy-policy
Area covered
Global
Description
The market size of the Anomaly Detection Service Industry is categorized based on Deployment Type (Cloud-based, On-premises) and Application (Fraud Detection, Network Security, IT Operations, Manufacturing, Healthcare) and Industry Vertical (BFSI, Retail, Telecommunications, Government, Healthcare, Manufacturing) and Technology (Machine Learning, Statistical Analysis, Data Mining, Artificial Intelligence, Deep Learning) and geographical regions (North America, Europe, Asia-Pacific, South America, and Middle-East and Africa).
Anomaly Detection Market - Research, Analysis & Growth
mordorintelligence.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence, Anomaly Detection Market - Research, Analysis & Growth [Dataset]. https://www.mordorintelligence.com/industry-reports/anomaly-detection-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Anomaly Detection Market report segments the industry into By Type (Solutions, Service), By End-user Industry (BFSI, Manufacturing, Healthcare, IT and Telecommunications, Other End-user Industries), By Deployment (On-premise, Cloud), and By Geography (North America, Europe, Asia, Latin America, Middle East and Africa). Get five years of historical data and five-year forecasts.
A
Anomaly Detection Solution Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Anomaly Detection Solution Report [Dataset]. https://www.archivemarketresearch.com/reports/anomaly-detection-solution-47688
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 25, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Anomaly detection solutions offer various advanced features to enhance their effectiveness, including:

Real-time Monitoring: Continuous analysis of activity and data to detect anomalies immediately. Automated Threat Detection: ML and AI algorithms automatically identify suspicious patterns and alert security teams. Historical Analysis: Analysis of historical data to establish baselines and improve anomaly detection accuracy. Actionable Insights: Provides detailed reports and recommendations to guide response and mitigation strategies.

Report Coverage & Deliverables Market Segmentations:

Type: Network Behavior Anomaly Detection, User Behavior Anomaly Detection Application: Banking, Financial Services, and Insurance (BFSI), Retail, Manufacturing, IT and Telecom, Others

Type

Network Behavior Anomaly Detection: Monitors network traffic patterns to detect anomalies indicating malicious or suspicious activity. User Behavior Anomaly Detection: Analyzes user activity and behavior patterns to identify suspicious deviations that may indicate compromise or insider threats.

Application

Banking, Financial Services, and Insurance (BFSI): Critical for detecting fraud, money laundering, and other financial crimes. Retail: Identifying abnormal purchase patterns, suspicious returns, and insider threats. Manufacturing: Monitoring industrial control systems and operational technology (OT) networks for anomalies. IT and Telecom: Detecting cyber attacks, data breaches, and malware on IT infrastructure and telecommunications networks.

Anomaly Detection Solution Regional Insights Regional Trends:

North America: Early adopter of anomaly detection solutions due to stringent regulations and high awareness of cyber threats. Europe: Strong market growth driven by GDPR and other compliance requirements. Asia-Pacific: Rapidly expanding market with increasing investment in digital infrastructure and cybersecurity. Middle East and Africa: Emerging market with growing demand for anomaly detection solutions to protect critical infrastructure and financial institutions.

Anomaly Detection Solution Trends

Increased Adoption of Cloud-Based Solutions: Cloud-based anomaly detection solutions offer flexibility, scalability, and reduced infrastructure costs. Integration with SIEM and SOAR: Anomaly detection solutions integrate with SIEM and security orchestration, automation, and response (SOAR) platforms to enhance threat response. Focus on Predictive Analytics: ML and AI are used to predict future anomalies, enabling proactive threat prevention. Advanced Threat Intelligence Sharing: Collaboration between businesses and security vendors to share threat intelligence and improve detection capabilities.

Driving Forces: What's Propelling the Anomaly Detection Solution?

Rising frequency and sophistication of cyber attacks Stricter data privacy and security regulations Increasing adoption of cloud computing Growing awareness of insider threats Technological advancements in ML and AI

Challenges and Restraints in Anomaly Detection Solution

False Positives: Anomaly detection solutions can generate false positive alerts, leading to unnecessary investigations and resource drain. Data Volume and Complexity: Increasing amounts of data from various sources make anomaly detection more challenging. Lack of Skilled Professionals: Finding qualified professionals with expertise in anomaly detection and cybersecurity can be difficult. Cost Considerations: Implementing and maintaining anomaly detection solutions can involve significant costs.

Emerging Trends in Anomaly Detection Solution

Behavioral Biometrics: Using ML to analyze user behavior patterns for anomaly detection. Context-Aware Anomaly Detection: Considering context and environmental factors to improve detection accuracy. Explainable AI: Providing explanations for anomaly detection results to improve trust and understanding. Automate Response: Using ML and AI to automate threat response based on detected anomalies.

Growth Catalysts in Anomaly Detection Solution Industry

Government Funding and Incentives: Governments are investing in cybersecurity research and development, including anomaly detection technologies. Strategic Partnerships: Partnerships between technology vendors and security service providers accelerate adoption. Increased Cyber Threat Awareness: Organizations are becoming more aware of the importance of anomaly detection to protect their assets.

Leading Players in the Anomaly Detection Solution

Cisco Systems, Inc. Dell Technologies, Inc. Hewlett Packard Enterprise Company Guardian Analytics Anodot, Ltd. Happiest Minds Gurucul Niara, Inc. Flowmon Networks Wipro Limited SAS Institute Inc. Symantec Corporation Trustwave Holdings, Inc. International Business Machines Corporation Logrhythm, Inc. Splunk, Inc. Trend Micro, Inc. Greycortex S.R.O. Securonix, Inc.

Significant Developments in Anomaly Detection Solution Sector

Partnerships between leading vendors to integrate anomaly detection solutions with wider security platforms. Investment in research and development of advanced ML and AI algorithms. Acquisition of smaller companies by established vendors to expand their anomaly detection capabilities.

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897

Controlled Anomalies Time Series (CATS) Dataset

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7646897

Dataset updated

Jul 12, 2024

Dataset provided by

Solenix Engineering GmbH

Authors

Patrick Fleith; Patrick Fleith

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
- 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
- 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
- 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
5 million timestamps. Sensors readings are at 1Hz sampling frequency.
- 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
- 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
Different types of anomalies to understand what anomaly types can be detected by different approaches.
Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

Clear search

Close search

Google apps

Main menu

Controlled Anomalies Time Series (CATS) Dataset

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly...

Domain-independent anomalies datasets Dataset

Comparative Analysis of Data-Driven Anomaly Detection Methods

Anomaly Detection in Sequences

BigDataAD Benchmark Dataset

DCASE 2024 Challenge Task 2 Additional Training Dataset

Comparison of Unsupervised Anomaly Detection Methods

Medical Out-of-Distribution Analysis Challenge 2022

DCASE 2025 Challenge Task 2 Evaluation Dataset

ComplexVAD Video Anomaly Detection Dataset

Data from: Anomaly Detection in a Fleet of Systems

Anomaly Detection Industry Report

Anomaly detection dataset

Anomaly Detection Technology Report

OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis...

Global Anomaly Detection Service Industry Insights: Market Size, Growth...

Anomaly Detection Market - Research, Analysis & Growth

Anomaly Detection Solution Report

Controlled Anomalies Time Series (CATS) DatasetSee More Versions

Controlled Anomalies Time Series (CATS) Dataset