There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
The ROAD dataset is made up of observations from the Low Frequency Array (LOFAR) telescope. LOFAR is comprised of 52 stations across Europe, where each station is an array of 96 dual polarisation low-band antennas (LBA) in the 10–90 MHz range and 48 or 96 dual polarisation high-band antenna antennas (HBA) in the 110–250 MHz range. The data are four dimensional, with the dimensions corresponding to time, frequency, polarisation, and station. dictate the array configuration (i.e. the number of stations used), the number of frequency channels (Nf), the time sampling, as well as the overall integration time (Nt) of the observing session. Furthermore, the dual-polarisation of the antennas results in a correlation product (Npol) of size 4. The ROAD dataset contains ten classes that describe various system-wide phenomena and anomalies from data obtained by the LOFAR telescope. These classes are categorised into four groups: data processing system failures, electronic anomalies, environmental effects, and unwanted astronomical events as shown by the table below.
Category | Description | Band | Polarisation | Occurrence rate | Num Samples |
---|---|---|---|---|---|
Normal | All non-characterised effects | Both | All | - | 4687 |
% Electric fence | RFI emitted from electric fences | Low | Cross | 64 | |
Data processing | |||||
First order data loss | Data loss from consecutive time and/or frequency channels | Both | All | 0.02 | 146 |
Second order data loss | Data loss from single frequency and/or single time channels | Both | All | 0.04 | 283 |
Electronic systems | |||||
High noise element | High power disturbances caused by miscellaneous events | Both | All | 0.01 | 88 |
Oscillating tile | Amplifier going into oscillation | High | All | 0.01 | 56 |
Astronomical events | |||||
Source in side-lobes | A-team source passing through side-lobes | High | All | 0.06 | 446 |
Galactic plane | Galactic plane passing through the main lobe of the antenna | Both | Cross | 0.08 | 550 |
Solar storm | Strong emissions from the sun | Low | All | 0.02 | 147 |
Environmental effects | |||||
Lightning | Lightning storm | Both | All | 0.06 | 389 |
Ionospheric RFI reflections | RFI reflected from the ionosphere | Low | All | 0.04 | 261 |
Anomaly Detection Market Size 2024-2028
The anomaly detection market size is forecast to increase by USD 3.71 billion at a CAGR of 13.63% between 2023 and 2028. Anomaly detection is a critical aspect of cybersecurity, particularly in sectors like healthcare where abnormal patient conditions or unusual network activity can have significant consequences. The market for anomaly detection solutions is experiencing significant growth due to several factors. Firstly, the increasing incidence of internal threats and cyber frauds has led organizations to invest in advanced tools for detecting and responding to anomalous behavior. Secondly, the infrastructural requirements for implementing these solutions are becoming more accessible, making them a viable option for businesses of all sizes. Data science and machine learning algorithms play a crucial role in anomaly detection, enabling accurate identification of anomalies and minimizing the risk of incorrect or misleading conclusions.
However, data quality is a significant challenge in this field, as poor quality data can lead to false positives or false negatives, undermining the effectiveness of the solution. Overall, the market for anomaly detection solutions is expected to grow steadily in the coming years, driven by the need for enhanced cybersecurity and the increasing availability of advanced technologies.
What will be the Anomaly Detection Market Size During the Forecast Period?
Request Free Sample
Anomaly detection, also known as outlier detection, is a critical data analysis technique used to identify observations or events that deviate significantly from the normal behavior or expected patterns in data. These deviations, referred to as anomalies or outliers, can indicate infrastructure failures, breaking changes, manufacturing defects, equipment malfunctions, or unusual network activity. In various industries, including manufacturing, cybersecurity, healthcare, and data science, anomaly detection plays a crucial role in preventing incorrect or misleading conclusions. Artificial intelligence and machine learning algorithms, such as statistical tests (Grubbs test, Kolmogorov-Smirnov test), decision trees, isolation forest, naive Bayesian, autoencoders, local outlier factor, and k-means clustering, are commonly used for anomaly detection.
Furthermore, these techniques help identify anomalies by analyzing data points and their statistical properties using charts, visualization, and ML models. For instance, in manufacturing, anomaly detection can help identify defective products, while in cybersecurity, it can detect unusual network activity. In healthcare, it can be used to identify abnormal patient conditions. By applying anomaly detection techniques, organizations can proactively address potential issues and mitigate risks, ensuring optimal performance and security.
Market Segmentation
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Deployment
Cloud
On-premise
Geography
North America
US
Europe
Germany
UK
APAC
China
Japan
South America
Middle East and Africa
By Deployment Insights
The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing a notable shift towards cloud-based solutions due to their numerous advantages over traditional on-premises systems. Cloud-based anomaly detection offers breaking changes such as quicker deployment, enhanced flexibility, and scalability, real-time data visibility, and customization capabilities. These features are provided by service providers with flexible payment models like monthly subscriptions and pay-as-you-go, making cloud-based software a cost-effective and economical choice. Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc are some prominent companies offering cloud-based anomaly detection solutions in addition to on-premise alternatives. In the context of security threats, architectural optimization, marketing strategies, finance, fraud detection, manufacturing, and defects, equipment malfunctions, cloud-based anomaly detection is becoming increasingly popular due to its ability to provide real-time insights and swift response to anomalies.
Get a glance at the market share of various segments Request Free Sample
The cloud segment accounted for USD 1.59 billion in 2018 and showed a gradual increase during the forecast period.
Regional Insights
When it comes to Anomaly Detection Market growth, North America is estimated to contribute 37% to the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast per
This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.
The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:
[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”
About Solenix
Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OPSSAT-AD - anomaly detection dataset for satellite telemetry
This is the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT---a CubeSat mission that has been operated by the European Space Agency.
It is accompanied by the paper with baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics that should always be calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible, and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.
segments.csv with the acquired telemetry signals from ESA OPS-SAT aircraft,
dataset.csv with the extracted, synthetic features are computed for each manually split and labeled telemetry segment.
code files for data processing and example modeliing (dataset_generator.ipynb for data processing, modeling_examples.ipynb with simple examples, requirements.txt- with details on Python configuration, and the LICENSE file)
Citation Bogdan, R. (2024). OPSSAT-AD - anomaly detection dataset for satellite telemetry [Data set]. Ruszczak. https://doi.org/10.5281/zenodo.15108715
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
finance
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Anomaly Detection Solution Market size was valued at USD 6.18 Billion in 2024 and is projected to reach USD 19.99 Billion by 2032, growing at a CAGR of 15.80% from 2026 to 2032.
Global Anomaly Detection Solution Market Dynamics
The key market dynamics that are shaping the global Anomaly Detection Solution Market include:
Key Market Drivers: Increasing Cybersecurity Threats: The surge in sophisticated cyberattacks and data breaches is a key driver of the Anomaly Detection Solution Market. Cybercriminals are increasingly targeting organizations with innovative tactics for breaching security systems. Anomaly detection solutions are critical for detecting unexpected patterns or behaviors that could indicate a threat such as unauthorized access or insider threats. Growing Volume of Data: The exponential rise of data generated by businesses, fueled by digital transformation and IoT devices, needs excellent anomaly detection.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The anomaly detection market is experiencing robust growth, fueled by the increasing volume and complexity of data generated across various industries. A compound annual growth rate (CAGR) of 16.22% from 2019 to 2024 suggests a significant market expansion, driven by the imperative for businesses to enhance cybersecurity, improve operational efficiency, and gain valuable insights from their data. Key drivers include the rising adoption of cloud computing, the proliferation of IoT devices generating massive datasets, and the growing need for real-time fraud detection and prevention, particularly within the BFSI (Banking, Financial Services, and Insurance) sector. The market is segmented by solution type (software, services), end-user industry (BFSI, manufacturing, healthcare, IT and telecommunications, others), and deployment (on-premise, cloud). The cloud deployment segment is anticipated to witness faster growth due to its scalability, cost-effectiveness, and ease of implementation. The increasing sophistication of cyberattacks and the need for proactive security measures are further bolstering demand for advanced anomaly detection solutions. While data privacy concerns and the complexity of integrating these solutions into existing IT infrastructure represent potential restraints, the overall market trajectory indicates a sustained period of expansion. Companies like SAS Institute, IBM, and Microsoft are actively shaping this market with their comprehensive offerings. The significant growth trajectory is expected to continue through 2033. The substantial investments in research and development by major players and the growing adoption across diverse sectors, including healthcare for predictive maintenance and anomaly detection in medical imaging, will continue to fuel the expansion. The competitive landscape is characterized by both established players offering comprehensive solutions and emerging niche players focusing on specific industry needs. This competitive dynamism fosters innovation and drives the development of more efficient and sophisticated anomaly detection technologies. While regional variations exist, North America and Europe currently hold a significant market share, with Asia-Pacific poised for rapid expansion due to increasing digitalization and investment in advanced technologies. This report provides a detailed analysis of the global anomaly detection market, projecting robust growth from $XXX million in 2025 to $YYY million by 2033. The study covers the historical period (2019-2024), base year (2025), and forecast period (2025-2033), offering invaluable insights for businesses navigating this rapidly evolving landscape. Keywords: Anomaly detection, machine learning, AI, cybersecurity, fraud detection, predictive analytics, data mining, big data analytics, real-time analytics. Recent developments include: June 2023: Wipro has launched a new suite of banking financial services built on Microsoft Cloud; the partnership will combine Microsoft Cloud capabilities with Wipro FullStride Cloud and leverage Wipro's and Capco's deep domain expertise in financial services. And develop new solutions to help financial services clients accelerate growth and deepen client relationships., June 2023: Cisco has announced delivering on its promise of the AI-driven Cisco Security Cloud to simplify cybersecurity and empower people to do their best work from anywhere, regardless of the increasingly sophisticated threat landscape. Cisco invests in cutting-edge artificial intelligence and machine learning innovations that will empower security teams by simplifying operations and increasing efficacy.. Key drivers for this market are: Increasing Number of Cyber Crimes, Increasing Adoption of Anomaly Detection Solutions in Software Testing. Potential restraints include: Open Source Alternatives Pose as a Threat. Notable trends are: BFSI is Expected to Hold a Significant Part of the Market Share.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.
The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.
This work is described in:
1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447
2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269
See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Introduction
The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.
The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.
Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.
At a Glance
License
The ComplexVAD dataset is released under CC-BY-SA-4.0 license.
All data:
Created by Mitsubishi Electric Research Laboratories (MERL), 2024
SPDX-License-Identifier: CC-BY-SA-4.0
Title: Unsupervised Anomaly Detection for Liquid-Fueled Rocket Propulsion Health Monitoring. Abstract: This article describes the results of applying four unsupervised anomaly detection algorithms to data from two rocket propulsion testbeds. The first testbed uses historical data from the Space Shuttle Main Engine. The second testbed uses data from an experimental rocket engine test stand located at NASA Stennis Space Center. The article describes nine anomalies detected by the four algorithms. The four algorithms use four different definitions of anomalousness. Orca uses a nearest-neighbor approach, defining a point to be an anomaly if its nearest neighbors in the data space are far away from it. The Inductive Monitoring System clusters the training data, and then uses the distance to the nearest cluster as its measure of anomalousness. GritBot learns rules from the training data, and then classifies points as anomalous if they violate these rules. One-class support vector machines map the data into a high-dimensional space in which most of the normal points are on one side of a hyperplane, and then classify points on the other side of the hyperplane as anomalous. Because of these different definitions of anomalousness, different algorithms detect different anomalies. We therefore conclude that it is useful to use multiple algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time Series for Anomaly Detection
The file is a Matlab data file. It contains 3 time series, representing the packet rate of 3 different traffic traces, related to inbound traffic of the UNINA Network. The traces were collected in year 2004. The packet rate was sampled with a period of 2 seconds and each trace lasts 2 hours. These data have been used for studies on volume-based anomaly detection and are related to time intervals during which no anomalies were observed on the UNINA network by the NOC operators. In other words, they can be considered anomaly-free.
When refering to our Anomaly Detection Dataset, please cite the following reference:
A. Dainotti, A. Pescapè, G. Ventre, "A cascade architecture for DoS attacks detection based on the wavelet transform", Journal of Computer Security, Volume 17, Number 6/2009, Pages 945-968.
A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The anomaly detection service market size is poised for substantial growth, with its valuation estimated at USD 4.5 billion in 2023 and projected to reach USD 12.8 billion by 2032, reflecting a robust CAGR of 12.4% during the forecast period. The exponential growth trajectory of this market is underpinned by several critical factors, including the increasing reliance on data-driven decision-making across industries, the rising sophistication of cyber threats, and the need for real-time monitoring and analysis. The growing integration of advanced technologies such as artificial intelligence and machine learning in anomaly detection solutions is further catalyzing market expansion by enhancing accuracy and reducing false positives.
One of the primary growth drivers of the anomaly detection service market is the escalating volume of data generated across diverse sectors. With the proliferation of IoT devices, mobile applications, and digital platforms, industries are inundated with massive datasets that require real-time analysis to derive actionable insights. Anomaly detection services provide the capability to sift through vast amounts of data to identify irregular patterns and potential threats, enabling organizations to act swiftly and mitigate risks. Additionally, the increasing focus on enhanced customer experiences and operational efficiency is propelling businesses to invest in robust anomaly detection solutions that ensure seamless operations and prevent disruptions.
The mounting frequency and complexity of cyberattacks have significantly contributed to the demand for advanced anomaly detection services. As cybercriminals employ more sophisticated methods to breach security systems, traditional security measures are often inadequate. Anomaly detection services, leveraging machine learning and artificial intelligence, can detect unusual patterns and deviations from normal behavior, thus providing an additional layer of security against cyber threats. Furthermore, regulatory requirements mandating data protection and privacy have compelled organizations to adopt anomaly detection solutions to comply with standards and safeguard sensitive information, driving further market growth.
Technological advancements and innovations in the field of artificial intelligence and big data analytics are playing a pivotal role in shaping the anomaly detection service market. These technologies enable the development of more refined and accurate detection models that can process and analyze data in real time. The integration of AI and ML algorithms not only increases the precision of anomaly detection but also helps in predicting future anomalies, thereby allowing organizations to take pre-emptive measures. The ability to customize and scale solutions according to specific organizational needs is another factor that is attracting enterprises towards investing in anomaly detection services.
The regional outlook for the anomaly detection service market is characterized by significant variations in growth rates and adoption patterns across different geographies. North America remains a dominant region due to the early adoption of cutting-edge technologies, a strong emphasis on cybersecurity, and substantial investments in IT infrastructure. Europe is also witnessing steady growth, driven by stringent regulatory norms and the increasing focus on safeguarding digital assets. Meanwhile, the Asia Pacific region is anticipated to exhibit the highest CAGR over the forecast period, fueled by rapid digital transformation, expanding IT and telecommunications sectors, and increasing awareness about the importance of cybersecurity in emerging economies.
In the anomaly detection service market, the component segmentation into software and services encapsulates a dynamic aspect of market growth. The software segment is witnessing a significant surge in demand as organizations increasingly seek sophisticated tools capable of real-time anomaly detection. These software solutions, often powered by AI and ML algorithms, facilitate the seamless integration of data from various sources, enhancing overall system efficiency. The burgeoning need for customizable and scalable solutions that can be tailored to specific industry requirements positions the software segment as a pivotal growth driver in the anomaly detection landscape.
On the other hand, the services segment is equally pivotal,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identifying change points and/or anomalies in dynamic network structures has become increasingly popular across various domains, from neuroscience to telecommunication to finance. One particular objective of anomaly detection from a neuroscience perspective is the reconstruction of the dynamic manner of brain region interactions. However, most statistical methods for detecting anomalies have the following unrealistic limitation for brain studies and beyond: that is, network snapshots at different time points are assumed to be independent. To circumvent this limitation, we propose a distribution-free framework for anomaly detection in dynamic networks. First, we present each network snapshot of the data as a linear object and find its respective univariate characterization via local and global network topological summaries. Second, we adopt a change point detection method for (weakly) dependent time series based on efficient scores, and enhance the finite sample properties of change point method by approximating the asymptotic distribution of the test statistic using the sieve bootstrap. We apply our method to simulated and to real data, particularly, two functional magnetic resonance imaging (fMRI) datasets and the Enron communication graph. We find that our new method delivers impressively accurate and realistic results in terms of identifying locations of true change points compared to the results reported by competing approaches. The new method promises to offer a deeper insight into the large-scale characterizations and functional dynamics of the brain and, more generally, into the intrinsic structure of complex dynamic networks. Supplemental materials for this article are available online.
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset is a subset of the Task-2 of DCASE 2020 Challenge. The Challenge is to identify anomaly of a machine using the audio data. There are three different parts of the dataset, namely, training, validation and testing which have been combined into a single dataset.
Training- https://zenodo.org/record/3678171
Validation- https://zenodo.org/record/3727685
Testing- https://zenodo.org/record/3841772
SAIVT-Campus Dataset
Overview
The SAIVT-Campus Database is an abnormal event detection database captured on a university campus, where the abnormal events are caused by the onset of a storm. Contact Dr Simon Denman or Dr Jingxin Xu for more information.
Licensing
The SAIVT-Campus database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, please include the following citation: Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. available at eprints.
Acknowledging the Database in your Publications
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications: We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-Campus database for our research.
Installing the SAIVT-Campus database
After downloading and unpacking the archive, you should have the following structure:
SAIVT-Campus +-- LICENCE.txt +-- README.txt +-- test_dataset.avi +-- training_dataset.avi +-- Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf
Notes
The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.
It contains two video files from real-world surveillance footage without any actors:
training_dataset.avi (the training dataset)
test_dataset.avi (the test dataset).
This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:
Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. Available at eprints.
This paper is also included with the database (Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf) Both video files are one hour in duration.
The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.
As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:
the training dataset does not have abnormal scenes
the test dataset separates into two parts: only normal activities occur from 00:00:00 to 00:47:16 abnormalities are present from 00:47:17 to 01:00:00. We annotate the time 00:47:17 as the start time for the abnormal events, as from this time on we have begun to observe people stop walking or turn back from walking towards the door to exit, which indicates that the rain outside the building has influenced the activities inside the building. Should you have any questions, please do not hesitate to contact Dr Jingxin Xu.
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).