100+ datasets found

d
Data from: Anomaly Detection in a Fleet of Systems
catalog.data.gov
datasets.ai
+5more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-a-fleet-of-systems
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.
v
Anomaly Detection from ASRS Databases of Textual Reports
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
s.cnmilf.com
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection from ASRS Databases of Textual Reports [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/anomaly-detection-from-asrs-databases-of-textual-reports
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Our primary goal is to automatically analyze textual reports from the Aviation Safety Reporting System (ASRS) database to detect/discover the anomaly categories reported by the pilots, and to assign each report to the appropriate category/categories. We have used two state-of-the-art models for text analysis: (i) mixture of von Mises Fisher (movMF) distributions, and (ii) latent Dirichlet allocation (LDA) on a subset of all ASRS reports. The models achieve a reasonably high performance in discovering anomaly categories and clustering reports. Each category is represented by the most representative words with the highest probability in this category. In addition, since the inference algorithm for LDA was somewhat slow, we have developed a new fast LDA algorithm which is 5-10 times more efficient than the original one, therefore more applicable for the practical use. Further, we have developed a simple visualization tool based on non-linear manifold embedding (ISOMAP) to generate a 2-d visual representation of each report based on its content/topics, which gives a direct view of the structure of the whole dataset as well as the outliers.
d
Anomaly Detection for Complex Systems
catalog.data.gov
s.cnmilf.com
+3more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection for Complex Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-for-complex-systems
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
In performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.
e
Data for: Active Learning for Anomaly Detection in Environmental data -...
opendata.eawag.ch
Updated Jan 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Data for: Active Learning for Anomaly Detection in Environmental data - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/active_learning
Explore at:
Dataset updated
Jan 15, 2021
Description
This package contains the data and code necessary to run the Active Learning experiments for Anomaly detection. The dataset used for this study is a timeseries data in high spatiotemporal resolution from a long term ecological experiment ("NUtrients, DREissena mussels, and Macrophytes - NUDREM")
c
Comparative Analysis of Data-Driven Anomaly Detection Methods
s.cnmilf.com
data.nasa.gov
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Comparative Analysis of Data-Driven Anomaly Detection Methods [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/comparative-analysis-of-data-driven-anomaly-detection-methods
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.
i
MIMIC dataset for Anomaly detection
ieee-dataport.org
Updated Jan 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prarthi Jain (2021). MIMIC dataset for Anomaly detection [Dataset]. https://ieee-dataport.org/documents/mimic-dataset-anomaly-detection
Explore at:
Dataset updated
Jan 31, 2021
Authors
Prarthi Jain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is part of the MIMIC database and specifically utilise the data corresponding to two patients with ids 221 and 230.
q
SAIVT-Campus Dataset
researchdatafinder.qut.edu.au
Updated Jun 30, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Simon Denman (2016). SAIVT-Campus Dataset [Dataset]. https://researchdatafinder.qut.edu.au/individual/n2531
Explore at:
Dataset updated
Jun 30, 2016
Dataset provided by
Queensland University of Technology (QUT)
Authors
Dr Simon Denman
Description
SAIVT-Campus Dataset

Overview

The SAIVT-Campus Database is an abnormal event detection database captured on a university campus, where the abnormal events are caused by the onset of a storm. Contact Dr Simon Denman or Dr Jingxin Xu for more information.

Licensing

The SAIVT-Campus database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.

Attribution

To attribute this database, please include the following citation: Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. available at eprints.

Acknowledging the Database in your Publications

In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications: We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-Campus database for our research.

Installing the SAIVT-Campus database

After downloading and unpacking the archive, you should have the following structure:

SAIVT-Campus +-- LICENCE.txt +-- README.txt +-- test_dataset.avi +-- training_dataset.avi +-- Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf

Notes

The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.

It contains two video files from real-world surveillance footage without any actors:

training_dataset.avi (the training dataset) test_dataset.avi (the test dataset).

This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:

Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. Available at eprints. This paper is also included with the database (Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf) Both video files are one hour in duration.

The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.

As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:

the training dataset does not have abnormal scenes the test dataset separates into two parts: only normal activities occur from 00:00:00 to 00:47:16 abnormalities are present from 00:47:17 to 01:00:00. We annotate the time 00:47:17 as the start time for the abnormal events, as from this time on we have begun to observe people stop walking or turn back from walking towards the door to exit, which indicates that the rain outside the building has influenced the activities inside the building. Should you have any questions, please do not hesitate to contact Dr Jingxin Xu.
i
Unified Spacecraft Anomaly Detection Benchmark Dataset
ieee-dataport.org
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankit Srivastava (2024). Unified Spacecraft Anomaly Detection Benchmark Dataset [Dataset]. https://ieee-dataport.org/documents/unified-spacecraft-anomaly-detection-benchmark-dataset
Explore at:
Dataset updated
Mar 30, 2024
Authors
Ankit Srivastava
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
finance

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

pdf

Updated Jun 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Jun 12, 2025

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2025 - 2029

Area covered

Canada, United States, Germany, United Kingdom

Description

Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.

What will be the Size of the Anomaly Detection Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial technoogyneural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.

How is this Anomaly Detection Industry segmented?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment

  Cloud
  On-premises


Component

  Solution
  Services


End-user

  BFSI
  IT and telecom
  Retail and e-commerce
  Manufacturing
  Others


Technology

  Big data analytics
  AI and ML
  Data mining and business intelligence


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Spain
    UK


  APAC

    China
    India
    Japan


  Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These soluti

d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
Anomaly Detection in Oil and Gas Chemical Plants
kaggle.com
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Python Developer (2025). Anomaly Detection in Oil and Gas Chemical Plants [Dataset]. https://www.kaggle.com/datasets/programmer3/anomaly-detection-in-oil-and-gas-chemical-plants
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2025
Dataset provided by
Kaggle
Authors
Python Developer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains sensor data collected from an oil and gas chemical plant, designed for anomaly detection. It includes time-series data, with key operational parameters such as temperature, pressure, flow rate, vibration levels, valve position, motor speed, and chemical concentration. The data is labeled with anomaly indicators, where 0 represents normal operational conditions and 1 represents an anomaly or abnormal event.
Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An...
zenodo.org
bin, csv, html
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy (2025). Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset" [Dataset]. http://doi.org/10.5281/zenodo.14062900
Explore at:
bin, html, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14062900
Dataset updated
Feb 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a large-scale anomaly detection dataset collected from IBM Cloud's Console over approximately 4.5 months. This high-dimensional dataset captures telemetry data from multiple data centers, specifically designed to aid researchers in developing and benchmarking anomaly detection methods in large-scale cloud environments. It contains 39,365 entries, each representing a 5-minute interval, with 117,448 features/attributes, as interval_start is used as the index. The dataset includes detailed information on request counts, HTTP response codes, and various aggregated statistics. The dataset also includes labeled anomaly events identified through IBM's internal monitoring tools, providing a comprehensive resource for real-world anomaly detection research and evaluation.

File Descriptions

location_downtime.csv - Details planned and unplanned downtimes for IBM Cloud data centers, including start and end times in ISO 8601 format.

unpivoted_data.parquet - Contains raw telemetry data with 413 million+ rows, covering details like location, HTTP status codes, request types, and aggregated statistics (min, max, median response times).

anomaly_windows.csv - Ground truth for anomalies, listing start and end times of recorded anomalies, categorized by source (Issue Tracker, Instant Messenger, Test Log).

pivoted_data_all.parquet - Pivoted version of the telemetry dataset with 39,365 rows and 117,449 columns, including aggregated statistics across multiple metrics and intervals.

demo/demo.[ipynb|html]: This demo file provides examples of how to access data in the Parquet files, available in Jupyter Notebook (.ipynb) and HTML (.html) formats, respectively.

Further details of the dataset can be found in Appendix B: Dataset Characteristics of the paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset." Sample code for training anomaly detectors using this data is provided in this package.

When using the dataset, please cite it as follows:

@misc{islam2024anomaly,
title={Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset},
author={Mohammad Saiful Islam and Mohamed Sami Rakha and William Pourmajidi and Janakan Sivaloganathan and John Steinbacher and Andriy Miranskyy},
year={2024},
eprint={2411.09047},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2411.09047}
}
e
Data for: The Value of Human Data Annotation for Machine Learning based...
opendata.eawag.ch
Updated Apr 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Data for: The Value of Human Data Annotation for Machine Learning based Anomaly Detection in Environmental Systems - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/data-for-the-value-of-human-expert-labelling-for-anomaly-detection-inenvironmental-systems
Explore at:
Dataset updated
Apr 1, 2022
Description
This package contains the data and code necessary to run the experiments for our paper "The Value of Human Data Annotation for Machine Learning1based Anomaly Detection in Environmental Systems".
Z
Data set for anomaly detection on a HPC system
data.niaid.nih.gov
zenodo.org
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Borghesi (2023). Data set for anomaly detection on a HPC system [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3251872
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
Andrea Borghesi
Francesco Beneventi
Andrea Bartolini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

This work is described in:

1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC
z
OPSSAT-AD - anomaly detection dataset for satellite telemetry
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruszczak Bogdan; Ruszczak Bogdan (2024). OPSSAT-AD - anomaly detection dataset for satellite telemetry [Dataset]. http://doi.org/10.5281/zenodo.12588359
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12588359
Dataset updated
Jul 9, 2024
Dataset provided by
Ruszczak
Authors
Ruszczak Bogdan; Ruszczak Bogdan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT---a CubeSat mission that has been operated by the European Space Agency.

It is accompanied by the paper with baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics that should always be calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible, and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.

The two included files are:

segments.csv with the acquired telemetry signals from ESA OPS-SAT aircraft,

dataset.csv with the extracted, synthetic features are computed for each manually split and labeled telemetry segment.

Please have a look at our two papers commenting on this dataset:

The benchmark paper with results of 30 supervised and unsupervised anomaly detection models for this collection:
Ruszczak, B., Kotowski. K., Nalepa, J., Evans, D.: The OPS-SAT benchmark for detecting anomalies in satellite telemetry, 2024, preprint arxiv: 2407.04730,

the conference paper in which we presented some preliminary results for this dataset:
Ruszczak, B., Kotowski. K., Andrzejewski, J., et al.: (2023). Machine Learning Detects Anomalies in OPS-SAT Telemetry. Computational Science – ICCS 2023. LNCS, vol 14073. Springer, Cham, DOI:10.1007/978-3-031-35995-8_21.
Bank Transaction Dataset for Fraud Detection
kaggle.com
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vala khorasani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

Key Features:

TransactionID: Unique alphanumeric identifier for each transaction.

AccountID: Unique identifier for each account, with multiple transactions per account.

TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.

TransactionDate: Timestamp of each transaction, capturing date and time.

TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.

Location: Geographic location of the transaction, represented by U.S. city names.

DeviceID: Alphanumeric identifier for devices used to perform the transaction.

IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.

MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.

AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.

PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.

Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).

CustomerAge: Age of the account holder, with logical groupings based on occupation.

CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.

TransactionDuration: Duration of the transaction in seconds, varying by transaction type.

LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.
r
KMASH Data Repository for outlier detection
research-repository.rmit.edu.au
researchdata.edu.au
+1more
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman (2023). KMASH Data Repository for outlier detection [Dataset]. http://doi.org/10.26180/5c6253c0b3323
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.26180/5c6253c0b3323
Dataset updated
May 30, 2023
Dataset provided by
RMIT University
Authors
Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The zip files contains 12338 datasets for outlier detection investigated in the following papers:(1) Instance space analysis for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles (2) On normalization and algorithm selection for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-MilesSome of these datasets were originally discussed in the paper: On the evaluation of unsupervised outlier detection:measures, datasets and an empirical studyAuthors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.
R
Data Streams Anomaly Detection Dataset
universe.roboflow.com
zip
Updated Sep 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kris (2024). Data Streams Anomaly Detection Dataset [Dataset]. https://universe.roboflow.com/kris-cv99j/data-streams-anomaly-detection/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 14, 2024
Dataset authored and provided by
kris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Anomaly Bounding Boxes
Description
Data Streams Anomaly Detection

## Overview Data Streams Anomaly Detection is a dataset for object detection tasks - it contains Anomaly annotations for 319 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Anomaly Detection in High-Dimensional Data
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles (2023). Anomaly Detection in High-Dimensional Data [Dataset]. http://doi.org/10.6084/m9.figshare.12844508.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12844508.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online.
ComplexVAD Video Anomaly Detection Dataset
zenodo.org
zip
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz (2024). ComplexVAD Video Anomaly Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.11475281
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11475281
Dataset updated
Jun 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Introduction

The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.

The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.

Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.

At a Glance

The size of the unzipped dataset is ~39GB

The dataset consists of Train sequences (containing only videos with normal activity), Test sequences (containing some anomalous activity), a ground truth annotation file for each Test sequence, and a README.md file describing the data organization and ground truth annotation format.

The zip files contain a Train directory, a Test directory, an annotations directory, and a README.md file.

License

The ComplexVAD dataset is released under CC-BY-SA-4.0 license.

All data:

Created by Mitsubishi Electric Research Laboratories (MERL), 2024 SPDX-License-Identifier: CC-BY-SA-4.0

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-a-fleet-of-systems

Data from: Anomaly Detection in a Fleet of Systems

Explore at:

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.

Clear search

Close search

Google apps

Main menu

Data from: Anomaly Detection in a Fleet of Systems

Anomaly Detection from ASRS Databases of Textual Reports

Anomaly Detection for Complex Systems

Data for: Active Learning for Anomaly Detection in Environmental data -...

Comparative Analysis of Data-Driven Anomaly Detection Methods

MIMIC dataset for Anomaly detection

SAIVT-Campus Dataset

Unified Spacecraft Anomaly Detection Benchmark Dataset

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

Anomaly Detection in Oil and Gas Chemical Plants

Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An...

Data for: The Value of Human Data Annotation for Machine Learning based...

Data set for anomaly detection on a HPC system

OPSSAT-AD - anomaly detection dataset for satellite telemetry

Bank Transaction Dataset for Fraud Detection

KMASH Data Repository for outlier detection

Data Streams Anomaly Detection Dataset

Data Streams Anomaly Detection

Anomaly Detection in High-Dimensional Data

ComplexVAD Video Anomaly Detection Dataset

Data from: Anomaly Detection in a Fleet of Systems