UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset is a subset of the Task-2 of DCASE 2020 Challenge. The Challenge is to identify anomaly of a machine using the audio data. There are three different parts of the dataset, namely, training, validation and testing which have been combined into a single dataset.
Training- https://zenodo.org/record/3678171
Validation- https://zenodo.org/record/3727685
Testing- https://zenodo.org/record/3841772
The UCF-Crime dataset is a large-scale dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety.
This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset, titled "Network Anomaly Dataset," is designed for the development and evaluation of machine learning models focused on network anomaly detection. The dataset is available in two versions: a labeled version where each instance is marked as "Anomaly" or "Normal," and an unlabeled version that can be used for unsupervised learning techniques.
Dataset Features: - Throughput: The amount of data successfully transmitted over a network in a given period. - Congestion: The degree of network traffic load, potentially leading to delays or packet loss. - Packet Loss: The percentage of packets that fail to reach their destination, indicative of network issues. - Latency: The time taken for data to travel from the source to the destination, crucial for time-sensitive applications. - Jitter: The variation in packet arrival times, affecting the quality of real-time communications.
Applications: - Supervised Learning: Use the labeled dataset to train and evaluate models such as Random Forest, SVM, and Logistic Regression for anomaly detection. - Unsupervised Learning: Apply techniques like clustering and change point detection on the unlabeled dataset to discover hidden patterns and anomalies.
This dataset is ideal for practitioners and researchers aiming to explore network security, develop robust anomaly detection models, or conduct comparative analysis between supervised and unsupervised learning methods.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:
Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)
Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score
Metadata: location, device used, payment channel, IP address, device hash
Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)
The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:
Binary and multiclass classification models
Fraud detection systems
Time-series anomaly detection
Feature engineering and model explainability
This dataset was created by Muhammad Ahmad
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains synthetic HTTP log data designed for cybersecurity analysis, particularly for anomaly detection tasks.
Dataset Features Timestamp: Simulated time for each log entry. IP_Address: Randomized IP addresses to simulate network traffic. Request_Type: Common HTTP methods (GET, POST, PUT, DELETE). Status_Code: HTTP response status codes (e.g., 200, 404, 403, 500). Anomaly_Flag: Binary flag indicating anomalies (1 = anomaly, 0 = normal). User_Agent: Simulated user agents for device and browser identification. Session_ID: Random session IDs to simulate user activity. Location: Geographic locations of requests. Applications This dataset can be used for:
Anomaly Detection: Identify suspicious network activity or attacks. Machine Learning: Train models for classification tasks (e.g., detect anomalies). Cybersecurity Analysis: Analyze HTTP traffic patterns and identify threats. Example Challenge Build a machine learning model to predict the Anomaly_Flag based on the features provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIADE dataset contains data from five industrial packaging machines:
Machine s_1: from 2020-01-01 14:00:00 to 2021-12-31 13:00:00
Machine s_2: from 2020-06-17 08:00:00 to 2021-12-31 07:00:00
Machine s_3: from 2020-10-07 12:00:00 to 2022-01-01 23:00:00
Machine s_4: from 2020-01-01 01:00:00 to 2022-01-01 23:00:00
Machine s_5: from 2020-01-20 08:00:00 to 2022-01-01 12:00:00
Each row represents a production interval, with the following schema:
interval_start: start of the production interval
equipment_ID: equipment identifier
alarm: alarm code of the active stop reason, if it occurred
type: idle, production, downtime, performance_loss or scheduled_downtime
start: start of the production interval
end: end of the production interval
elapsed: duration of the production interval
pi: input packages
po: output packages
speed: speed (packages per hour)
There are 133 different types of alerts, and 429394 rows.
For each piece of equipment, we define sequences of length = 1 hour and we aggregate raw interval data as follows:
'equipment_ID': machine identifier
'#changes': changes in machine state
'%downtime': time spent in 'downtime' state
'%idle': time spent in 'idle' state
'%performance_loss': time spent in 'performance loss' state
'%production': time spent in production
'%scheduled_downtime': time spent in scheduled downtime
'count_sum': sum of all alarm occurrences
'A_
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.
The dataset results from the work of an 18-month project carried by an industry Consortium composed of Airbus Defence and Space, KP Labs and the European Space Agency’s European Space Operations Centre. The project, funded by the European Space Agency (ESA), is part of the Artificial Intelligence for Automation (A²I) Roadmap (De Canio et al., 2023), a large endeavour started in 2021 to automate space operations by leveraging artificial intelligence.
Further details can be found on the arXiv and Github.
ReferencesDe Canio, G. et al. (2023) Development of an actionable AI roadmap for automating mission operations. In, 2023 SpaceOps Conference. American Institute of Aeronautics and Astronautics, Dubai, United Arab Emirates.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Acknowledgment to supporters: "Thank you to everyone who supported the UGRansome dataset; it has received a Bronze medal on Kaggle!"
The UGRansome dataset is a versatile cybersecurity resource designed for the analysis of ransomware and zero-day cyber-attacks, particularly those exhibiting cyclostationary behavior. This dataset features various essential components, including timestamps for attack time tracking, flags for categorizing attack types, protocol data for understanding attack vectors, network flow details to observe data transfer patterns, and ransomware family classifications.
It also provides insight into the associated malware, numeric clustering for pattern recognition, and quantifies financial damage in both USD and bitcoins (BTC). The dataset employs machine learning to generate attack signatures and offers synthetic signatures for testing and simulating cybersecurity defenses.
Additionally, it can be used to identify and document anomalies, contributing to anomaly detection research and enhancing cybersecurity understanding and preparedness. This dataset offers valuable information for researchers and practitioners interested in leveraging it for various analytical and investigatory purposes such as ransomware and zero-day threats detection and classification. The dataset required deduplication and transformation.
The UGRansome dataset has been previously utilized in studies by Tokmak (2022); Alhashmi et al. (2024); Chaudhary & Adhikari (2024); Sokhonn, Park, & Lee (2024); P. Yan et al. (2024), Sharath Kumar et al. (2024), and Mohamed, A.A., Al-Saleh, A., Sharma, S.K. et al. (2025).
It has been utilized and cited in several master's dissertations and reports, demonstrating its relevance in the field of anomaly intrusion detection. Notable examples include:
S. R. Zahra, 2022. "UGRansome: Optimal Approach for Anomaly Intrusion Detection and Zero-day Threats using Cloud Environment." Master's Research in Cloud Computing, School of Computing, National College of Ireland. https://www.researchgate.net/publication/365172610_UGRansome_Optimal_Approach_for_Anomaly_Intrusion_Detection_and_Zero-day_Threats_using_Cloud_Environment_MSc_Research_Project_Cloud_Computing/citations
B. Torky, 2023. "Ensemble Methods for Anomaly Detection in Enterprise Systems." Thesis, Rochester Institute of Technology, Dubai. Advisor: Sanjay Modak.https://repository.rit.edu/theses/11497/
A. Igugu, 2024. "Evaluating the Effectiveness of AI and Machine Learning Techniques for Zero-Day Attacks Detection in Cloud Environments" Master of Science in Information Security, Luleå University of Technology, Sweden. Department of Computer Science, Electrical and Space Engineering. Supervisor: Dr. Saguna. Examiner: Prof. Christer Ahlund. https://www.diva-portal.org/smash/get/diva2:1890285/FULLTEXT02
Duran, M., duSoft Yazılım, A.Ş. and Kilinc, H., 2024. D2. 1–Academic and Technology SoTA Report. Sierra (Panel), 1, pp.26-11. Edited by: Hakan Kilinc (Orion, Türkiye), Eva Catarina Gomes Maia (ISEP, Portugal), Orhan Yildirim (Beam Teknoloji, Türkiye), Gabriela Sousa (VisionWare, Portugal), Özgü Özkan, Melike Çolak, Nesil Bor (Bites, Türkiye), Daniel Esteban Villamil Sierra (Panel, Spain). https://itea4.org/project/vesta.html
Kaliberda A. A. Development of an anti-virus solution based on neural networks: master's thesis; Ural Federal University, Institute of Radio Electronics and Information Technologies-RTF, Department of Information Technologies and Control Systems. Russia — Yekaterinburg, 2024. — 52 p. http://elar.urfu.ru/handle/10995/140331
These citations underline the impact of the UGRansome in advancing research on intrusion detection and cybersecurity:
• Mohamed, A.A., Al-Saleh, A., Sharma, S.K. et al. Zero-day exploits detection with adaptive WavePCA-Autoencoder (AWPA) adaptive hybrid exploit detection network (AHEDNet). Sci Rep 15, 4036 (2025). https://doi.org/10.1038/s41598-025-87615-2
• P. Yan, T. T. Khoei, R. S. Hyder and R. S. Hyder, "A Dual-Stage Ensemble Approach to Detect and Classify Ransomware Attacks," 2024 IEEE 15th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), Yorktown Heights, NY, USA, 2024, pp. 781-786, doi: 10.1109/UEMCON62879.2024.10754695.
• Por, L.Y., Dai, Z., Leem, S.J., Chen, Y., Yang, J., Binbeshr, F., Phan, K.Y. and Ku, C.S., 2024. A Systematic Literature Review on the Methods and Challenges in Detecting Zero-Day Attacks: Insights from the Recent CrowdStrike Incident. IEEE Access.
• Torky, B., Karamitsos, I., Najar, T. (2024). Anomaly Detection in Enterprise Payment Systems: An Ensemble Machine Learning Approach. In: Emrouznejad, A., Zervopoulos, P.D., Ozturk, I., Jamali, D., Rice, J. (eds) Business Analytics and Decision Making in Practice. ICBAP 2024. Lecture Notes in Operations Research. Springer, Cham. https://doi.org/10.1007/978-3-...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Le Hung
Released under Apache 2.0
This dataset was created by k20 1702 Bilal Mamji
This dataset was created by Arun
It contains the following files:
This dataset was created by Shweta Dalal
This is a dataset of picture that were found on MVTec Anomaly Detection : https://www.mvtec.com/
There are three files in this dataset: train, test, and gound_truth.
The training set has 320 (1024x1024) pictures of screws which have no anomalies and the test file have : good, manipulated_front, scratch_head, scratch_neck, thread_side, thread_top for a total of 160 pictures with the matching ground_truth.
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger, "A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection", IEEE Conference on Computer Vision and Pattern Recognition, 2019
This dataset was created by Đạt Savu
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The UCF-Crime dataset is a large-scale collection of real-world surveillance videos, featuring a diverse range of crime and normal activities. This dataset is ideal for training and evaluating advanced AI models for anomaly detection and video understanding tasks.
Real-world Data: The videos are sourced from real-world surveillance cameras, ensuring a realistic and challenging environment. Diverse Anomalies: The dataset covers a wide range of crime categories, including both common and rare events. Long, Untrimmed Videos: The videos are long and untrimmed, providing a more realistic and challenging scenario for anomaly detection. Detailed Annotations: The videos are meticulously annotated with bounding boxes, timestamps, and labels for each anomaly, enabling precise model training and evaluation.
Real-world Anomaly Detection in Surveillance Videos Link to download the data
This dataset was created by yuanheqiuye
This dataset was created by Hiep Le
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OPSSAT-AD - anomaly detection dataset for satellite telemetry
This is the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT---a CubeSat mission that has been operated by the European Space Agency.
It is accompanied by the paper with baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics that should always be calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible, and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.
segments.csv with the acquired telemetry signals from ESA OPS-SAT aircraft,
dataset.csv with the extracted, synthetic features are computed for each manually split and labeled telemetry segment.
code files for data processing and example modeliing (dataset_generator.ipynb for data processing, modeling_examples.ipynb with simple examples, requirements.txt- with details on Python configuration, and the LICENSE file)
Citation Bogdan, R. (2024). OPSSAT-AD - anomaly detection dataset for satellite telemetry [Data set]. Ruszczak. https://doi.org/10.5281/zenodo.15108715
UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.