100+ datasets found

d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
d
Comparative Analysis of Data-Driven Anomaly Detection Methods
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Comparative Analysis of Data-Driven Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparative-analysis-of-data-driven-anomaly-detection-methods
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.
Z
Controlled Anomalies Time Series (CATS) Dataset
data.niaid.nih.gov
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7646896
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.

Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

Change Log

Version 2

Metadata: we include a metadata.csv with information about:

Anomaly categories

Root cause channel (signal in which the anomaly is first visible)

Affected channel (signal in which the anomaly might propagate) through coupled system dynamics

Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.

Two data files: CSV and parquet for convenience.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
Data from: Detecting Anomalies in Multivariate Data Sets with Switching...
data.nasa.gov
datasets.ai
+2more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Detecting Anomalies in Multivariate Data Sets with Switching Sequences and Continuous Streams [Dataset]. https://data.nasa.gov/dataset/detecting-anomalies-in-multivariate-data-sets-with-switching-sequences-and-continuous-stre
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
d
NCARDRS Congenital Anomaly Official Statistics Report, 2020
digital.nhs.uk
Updated Dec 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). NCARDRS Congenital Anomaly Official Statistics Report, 2020 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/ncardrs-congenital-anomaly-statistics-annual-data
Explore at:
Dataset updated
Dec 1, 2022
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Description
This publication contains information on congenital anomalies in babies delivered in England in 2020. It includes this report showing key findings, spreadsheet tables with more detailed estimates and a methodology document.

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

Updated Oct 11, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2022). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis

Explore at:

Dataset updated

Oct 11, 2022

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Canada, Mexico, United States, Germany, Global

Description

Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.

What will be the Size of the Anomaly Detection Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial neural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.

How is this Anomaly Detection Industry segmented?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment

  Cloud
  On-premises


Component

  Solution
  Services


End-user

  BFSI
  IT and telecom
  Retail and e-commerce
  Manufacturing
  Others


Technology

  Big data analytics
  AI and ML
  Data mining and business intelligence


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Spain
    UK


  APAC

    China
    India
    Japan


  Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These solutions use v

f
Data from: Nonparametric Anomaly Detection on Time Series of Graphs
tandf.figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dorcas Ofori-Boateng; Yulia R. Gel; Ivor Cribben (2023). Nonparametric Anomaly Detection on Time Series of Graphs [Dataset]. http://doi.org/10.6084/m9.figshare.13180181.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13180181.v3
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Dorcas Ofori-Boateng; Yulia R. Gel; Ivor Cribben
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying change points and/or anomalies in dynamic network structures has become increasingly popular across various domains, from neuroscience to telecommunication to finance. One particular objective of anomaly detection from a neuroscience perspective is the reconstruction of the dynamic manner of brain region interactions. However, most statistical methods for detecting anomalies have the following unrealistic limitation for brain studies and beyond: that is, network snapshots at different time points are assumed to be independent. To circumvent this limitation, we propose a distribution-free framework for anomaly detection in dynamic networks. First, we present each network snapshot of the data as a linear object and find its respective univariate characterization via local and global network topological summaries. Second, we adopt a change point detection method for (weakly) dependent time series based on efficient scores, and enhance the finite sample properties of change point method by approximating the asymptotic distribution of the test statistic using the sieve bootstrap. We apply our method to simulated and to real data, particularly, two functional magnetic resonance imaging (fMRI) datasets and the Enron communication graph. We find that our new method delivers impressively accurate and realistic results in terms of identifying locations of true change points compared to the results reported by competing approaches. The new method promises to offer a deeper insight into the large-scale characterizations and functional dynamics of the brain and, more generally, into the intrinsic structure of complex dynamic networks. Supplemental materials for this article are available online.
d
Anomaly Detection in Sequences
catalog.data.gov
datasets.ai
+3more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection in Sequences [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-sequences
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
Anomaly detection from sound data- Fan
kaggle.com
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vuppala Adithya Sairam (2023). Anomaly detection from sound data- Fan [Dataset]. https://www.kaggle.com/datasets/vuppalaadithyasairam/anomaly-detection-from-sound-data-fan
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vuppala Adithya Sairam
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The dataset is a subset of the Task-2 of DCASE 2020 Challenge. The Challenge is to identify anomaly of a machine using the audio data. There are three different parts of the dataset, namely, training, validation and testing which have been combined into a single dataset.

Training- https://zenodo.org/record/3678171

Validation- https://zenodo.org/record/3727685

Testing- https://zenodo.org/record/3841772
Comparison of Unsupervised Anomaly Detection Methods
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+1more
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://data.nasa.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.
D
Data from: HR-Crime: Human-Related Anomaly Detection in Surveillance Videos
test.dataverse.nl
dataverse.nl
txt, zip
Updated Aug 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kayleigh Boekhoudt; Alina Matei; Estefania Talavera Martinez; Estefania Talavera Martinez; Kayleigh Boekhoudt; Alina Matei (2021). HR-Crime: Human-Related Anomaly Detection in Surveillance Videos [Dataset]. http://doi.org/10.34894/IRRDJE
Explore at:
zip(209333), zip(548594155), zip(3028585929), txt(1581), txt(8513)Available download formats
Unique identifier
https://doi.org/10.34894/IRRDJE
Dataset updated
Aug 5, 2021
Dataset provided by
DataverseNL (test)
Authors
Kayleigh Boekhoudt; Alina Matei; Estefania Talavera Martinez; Estefania Talavera Martinez; Kayleigh Boekhoudt; Alina Matei
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The automatic detection of anomalies captured by surveillance settings is essential for speeding the otherwise laborious approach. To date, UCF-Crime is the largest available dataset for automatic visual analysis of anomalies and consists of real-world crime scenes of various categories. In this paper, we introduce HR-Crime, a subset of the UCF-Crime dataset suitable for human-related anomaly detection tasks. We rely on state-of-the-art techniques to build the feature extraction pipeline for human-related anomaly detection. Furthermore, we present the baseline anomaly detection analysis on the HR-Crime. HR-Crime as well as the developed feature extraction pipeline and the extracted features will be publicly available for further research in the field.
i
Unified Spacecraft Anomaly Detection Benchmark Dataset
ieee-dataport.org
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankit Srivastava (2024). Unified Spacecraft Anomaly Detection Benchmark Dataset [Dataset]. https://ieee-dataport.org/documents/unified-spacecraft-anomaly-detection-benchmark-dataset
Explore at:
Dataset updated
Mar 30, 2024
Authors
Ankit Srivastava
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
finance
f
Data from: Subset Multivariate Collective and Point Anomaly Detection
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander T. M. Fisch; Idris A. Eckley; Paul Fearnhead (2023). Subset Multivariate Collective and Point Anomaly Detection [Dataset]. http://doi.org/10.6084/m9.figshare.17054276.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17054276.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Alexander T. M. Fisch; Idris A. Eckley; Paul Fearnhead
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies—a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation. Supplemental files for this article are available online.
Z
Data set for anomaly detection on a HPC system
data.niaid.nih.gov
zenodo.org
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Borghesi (2023). Data set for anomaly detection on a HPC system [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3251872
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
Andrea Borghesi
Andrea Bartolini
Francesco Beneventi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

This work is described in:

1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC

Radio observatory anomaly detection dataset Dataset

paperswithcode.com

Updated Jan 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Mesarcik; Albert-Jan Boonstra; Marco Iacobelli; Elena Ranguelova; Cees de Laat; Rob van Nieuwpoort (2024). Radio observatory anomaly detection dataset Dataset [Dataset]. https://paperswithcode.com/dataset/radio-observatory-anomaly-detection-dataset

Explore at:

Dataset updated

Jan 30, 2024

Authors

Michael Mesarcik; Albert-Jan Boonstra; Marco Iacobelli; Elena Ranguelova; Cees de Laat; Rob van Nieuwpoort

Description

The ROAD dataset is made up of observations from the Low Frequency Array (LOFAR) telescope. LOFAR is comprised of 52 stations across Europe, where each station is an array of 96 dual polarisation low-band antennas (LBA) in the 10–90 MHz range and 48 or 96 dual polarisation high-band antenna antennas (HBA) in the 110–250 MHz range. The data are four dimensional, with the dimensions corresponding to time, frequency, polarisation, and station. dictate the array configuration (i.e. the number of stations used), the number of frequency channels (Nf), the time sampling, as well as the overall integration time (Nt) of the observing session. Furthermore, the dual-polarisation of the antennas results in a correlation product (Npol) of size 4. The ROAD dataset contains ten classes that describe various system-wide phenomena and anomalies from data obtained by the LOFAR telescope. These classes are categorised into four groups: data processing system failures, electronic anomalies, environmental effects, and unwanted astronomical events as shown by the table below.

Category	Description	Band	Polarisation	Occurrence rate	Num Samples
Normal	All non-characterised effects	Both	All	-	4687
% Electric fence	RFI emitted from electric fences	Low	Cross	64
Data processing
First order data loss	Data loss from consecutive time and/or frequency channels	Both	All	0.02	146
Second order data loss	Data loss from single frequency and/or single time channels	Both	All	0.04	283
Electronic systems
High noise element	High power disturbances caused by miscellaneous events	Both	All	0.01	88
Oscillating tile	Amplifier going into oscillation	High	All	0.01	56
Astronomical events
Source in side-lobes	A-team source passing through side-lobes	High	All	0.06	446
Galactic plane	Galactic plane passing through the main lobe of the antenna	Both	Cross	0.08	550
Solar storm	Strong emissions from the sun	Low	All	0.02	147
Environmental effects
Lightning	Lightning storm	Both	All	0.06	389
Ionospheric RFI reflections	RFI reflected from the ionosphere	Low	All	0.04	261

P
NAB Dataset
paperswithcode.com
Updated Nov 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Lavin; Subutai Ahmad (2020). NAB Dataset [Dataset]. https://paperswithcode.com/dataset/nab
Explore at:
Dataset updated
Nov 9, 2020
Authors
Alexander Lavin; Subutai Ahmad
Description
The First Temporal Benchmark Designed to Evaluate Real-time Anomaly Detectors Benchmark

The growth of the Internet of Things has created an abundance of streaming data. Finding anomalies in this data can provide valuable insights into opportunities or failures. Yet it’s difficult to achieve, due to the need to process data in real time, continuously learn and make predictions. How do we evaluate and compare various real-time anomaly detection techniques?

The Numenta Anomaly Benchmark (NAB) provides a standard, open source framework for evaluating real-time anomaly detection algorithms on streaming data. Through a controlled, repeatable environment of open-source tools, NAB rewards detectors that find anomalies as soon as possible, trigger no false alarms, and automatically adapt to any changing statistics.

NAB comprises two main components: a scoring system designed for streaming data and a dataset with labeled, real-world time-series data.
Annual precipitation anomaly in the United States 1900-2024
statista.com
ai-chatbox.pro
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Annual precipitation anomaly in the United States 1900-2024 [Dataset]. https://www.statista.com/statistics/1293607/precipitation-anomaly-in-the-us/
Explore at:
Dataset updated
Feb 2, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, precipitation in the United States stood 1.66 inches above the annual average recorded across the previous century (1901 to 2020). Except for 2022 and 2023, the past 10 years have all seen annual precipitation above the average, with the highest anomaly of the displayed period recorded in 2019, at nearly five inches of rainfall.
v
Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly...
verifiedmarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verified Market Research, Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly Detection, Machine Learning Anomaly Detection), By Application (Network Security, Fraud Detection, Risk Management), By Industry Vertical (Banking, Financial Services, And Insurance (BFSI), Retail And E-commerce, Healthcare), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/anomaly-detection-solution-market/
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Verified Market Research
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Global Anomaly Detection Solution Market size was valued at USD 6.18 Billion in 2024 and is projected to reach USD 19.99 Billion by 2032, growing at a CAGR of 15.80% from 2026 to 2032.Global Anomaly Detection Solution Market DynamicsThe key market dynamics that are shaping the global Anomaly Detection Solution Market include:Key Market Drivers:Increasing Cybersecurity Threats: The surge in sophisticated cyberattacks and data breaches is a key driver of the Anomaly Detection Solution Market. Cybercriminals are increasingly targeting organizations with innovative tactics for breaching security systems. Anomaly detection solutions are critical for detecting unexpected patterns or behaviors that could indicate a threat such as unauthorized access or insider threats.Growing Volume of Data: The exponential rise of data generated by businesses, fueled by digital transformation and IoT devices, needs excellent anomaly detection.
Data from: Theoretically Optimal Distributed Anomaly Detection
data.nasa.gov
datasets.ai
+1more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Theoretically Optimal Distributed Anomaly Detection [Dataset]. https://data.nasa.gov/dataset/theoretically-optimal-distributed-anomaly-detection
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A novel general framework for distributed anomaly detection with theoretical performance guarantees is proposed. Our algorithmic approach combines existing anomaly detection procedures with a novel method for computing global statistics using local sufficient statistics. Under a Gaussian assumption, our distributed algorithm is guaranteed to perform as well as its centralized counterpart, a condition we call Ôzero information lossÕ. We further report experimental results on synthetic as well as real-world data to demonstrate the viability of our approach.
d
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
catalog.data.gov
datadiscoverystudio.org
+5more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data

Explore at:

Dataset updated

Apr 11, 2025

Dataset provided by

Dashlink

Description

There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

Clear search

Close search

Google apps

Main menu

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

Comparative Analysis of Data-Driven Anomaly Detection Methods

Controlled Anomalies Time Series (CATS) Dataset

Data from: Detecting Anomalies in Multivariate Data Sets with Switching...

NCARDRS Congenital Anomaly Official Statistics Report, 2020

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: Nonparametric Anomaly Detection on Time Series of Graphs

Anomaly Detection in Sequences

Anomaly detection from sound data- Fan

Comparison of Unsupervised Anomaly Detection Methods

Data from: HR-Crime: Human-Related Anomaly Detection in Surveillance Videos

Unified Spacecraft Anomaly Detection Benchmark Dataset

Data from: Subset Multivariate Collective and Point Anomaly Detection

Data set for anomaly detection on a HPC system

Radio observatory anomaly detection dataset Dataset

NAB Dataset

Annual precipitation anomaly in the United States 1900-2024

Global Anomaly Detection Solution Market Size By Type (Statistical Anomaly...

Data from: Theoretically Optimal Distributed Anomaly Detection

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data