Facebook
TwitterThis dataset contains expert-labeled telemetry anomaly data from the Soil Moisture Active Passive (SMAP) satellite and the Mars Science Laboratory (MSL) rover, Curiosity.
Indications of telemetry anomalies can be found within previously mentioned ISA reports. All telemetry channels discussed in an individual ISA were reviewed to ensure that the anomaly was evident in the associated telemetry data, and specific anomalous time ranges were manually labeled for each channel. If multiple anomalous sequences and channels closely resembled each other, only one was kept for the experiment in order to create a diverse and balanced set. Anomalies were classified into two categories, point and contextual, to distinguish between anomalies that would likely be identified by properly set alarms or distance-based methods that ignore temporal information (point anomalies) and those that require more complex methodologies such as LSTMs or Hierarchical Temporal Memory (HTM) approaches to detect (contextual anomalies)
TM Channels (27) Total TM values (66,709) Total anomalies (36)
Data in .npy files
All credits go to the original authors of the dataset, many thanks to them for making such data publicly available: - Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, Tom Soderstrom. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding, 2018, NASA Jet Propulsion Laboratory - Read more of NASA anomaly detection work: https://github.com/khundman/telemanom
Facebook
TwitterUCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.
Facebook
TwitterIn performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.
Facebook
TwitterIn performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.
Facebook
TwitterA fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This provides the UCR Time Series Anomaly Detection datasets [1] publicly available on this webpage. This dataset repository is created to ensure the version of the dataset used in the TimeVQVAE-AD paper [2].References[1] Keogh, Eamonn, et al. "Multi-dataset time-series anomaly detection competition." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://compete. hexagonml. com/practice/competition/39. 2021.[2] Lee, Daesoo, Sara Malacarne, and Erlend Aune. "Explainable time series anomaly detection using masked latent generative modeling." Pattern Recognition (2024): 110826.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
this project is for anomaly detectionnnnnnnnnnnnnnnnnnnn
Facebook
TwitterMany existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.
Facebook
TwitterWe present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The SmartSys-CTI dataset is a synthetically generated yet realistic dataset created for research and development in anomaly detection and cyber threat intelligence (CTI) within smart system environments. It simulates activity logs and network behavior from smart devices commonly found in IoT-enabled infrastructures such as smart homes, industrial IoT, smart grids, and healthcare systems.
It includes both normal operational data and anomalous activity patterns such as Denial-of-Service (DoS), spoofing, and data injection, making it ideal for training and evaluating intelligent intrusion detection systems (IDS).
⭐ Key Features 🔐 Cyber Threat Scenarios Includes labeled data for multiple cyberattacks: DoS, spoofing, injection.
📊 Rich Feature Set Covers CPU/memory usage, network traffic, packet rate, encryption status, location variance, and more.
🧠 Deep Learning Ready Designed for Capsule Networks (CapsNet), Extreme Learning Machines (ELM), and other hybrid deep models.
⏱️ Time-Series Support Timestamped logs simulate real-time operations for sequential models (e.g., RNNs, LSTMs).
🧪 Multi-Class Labels Provides a labeled target column for normal vs specific attack types, aiding multiclass classification.
⚡ Scalable and Lightweight Efficient format suitable for real-time detection system prototyping and testing.
This dataset provides a practical foundation for developing scalable, accurate, and adaptive cybersecurity solutions in modern smart environments. Researchers and practitioners can use it to evaluate model performance, test feature extraction techniques, or simulate real-time defense systems.
Facebook
TwitterWe present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
Facebook
TwitterThere has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Steel Anomaly Detection is a dataset for object detection tasks - it contains Anomly Present Anomaly Absent annotations for 2,761 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwittergentlemenOfsaclay/anomaly-detection-image dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Introduction
The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.
The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.
Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.
At a Glance
License
The ComplexVAD dataset is released under CC-BY-SA-4.0 license.
All data:
Created by Mitsubishi Electric Research Laboratories (MERL), 2024
SPDX-License-Identifier: CC-BY-SA-4.0
Facebook
Twitterhangyeol522/anomaly-detection-model dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this study was compiled from various sources to ensure a diverse and representative collection of road anomalies. Primary data were gathered by recording videos and capturing images using mobile cameras and surveillance systems in different cities across Pakistan. Additionally, footages were obtained through collaboration with friends, who provided videos from social media platforms and other online sources, including contributions from different countries. This comprehensive approach allowed for the creation of a dataset that encompasses a wide range of scenarios and conditions pertinent to the study's focus on road anomalies.
Facebook
TwitterLargest Visual Anomaly detection dataset containing objects from 12 classes in 3 domains across 10,821(9,621 normal and 1,200 anomaly) images. Both image and pixel level annotations are provided.
Facebook
TwitterThe Synthetic Anomaly Detection dataset is a time series classification dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Prodeman Anomaly Detection is a dataset for object detection tasks - it contains Palis Piedra Maiz And More annotations for 394 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis dataset contains expert-labeled telemetry anomaly data from the Soil Moisture Active Passive (SMAP) satellite and the Mars Science Laboratory (MSL) rover, Curiosity.
Indications of telemetry anomalies can be found within previously mentioned ISA reports. All telemetry channels discussed in an individual ISA were reviewed to ensure that the anomaly was evident in the associated telemetry data, and specific anomalous time ranges were manually labeled for each channel. If multiple anomalous sequences and channels closely resembled each other, only one was kept for the experiment in order to create a diverse and balanced set. Anomalies were classified into two categories, point and contextual, to distinguish between anomalies that would likely be identified by properly set alarms or distance-based methods that ignore temporal information (point anomalies) and those that require more complex methodologies such as LSTMs or Hierarchical Temporal Memory (HTM) approaches to detect (contextual anomalies)
TM Channels (27) Total TM values (66,709) Total anomalies (36)
Data in .npy files
All credits go to the original authors of the dataset, many thanks to them for making such data publicly available: - Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, Tom Soderstrom. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding, 2018, NASA Jet Propulsion Laboratory - Read more of NASA anomaly detection work: https://github.com/khundman/telemanom