100+ datasets found
  1. d

    Data from: Anomaly Detection in a Fleet of Systems

    • catalog.data.gov
    • datasets.ai
    • +5more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-a-fleet-of-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.

  2. v

    Anomaly Detection from ASRS Databases of Textual Reports

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • s.cnmilf.com
    • +2more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Anomaly Detection from ASRS Databases of Textual Reports [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/anomaly-detection-from-asrs-databases-of-textual-reports
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Our primary goal is to automatically analyze textual reports from the Aviation Safety Reporting System (ASRS) database to detect/discover the anomaly categories reported by the pilots, and to assign each report to the appropriate category/categories. We have used two state-of-the-art models for text analysis: (i) mixture of von Mises Fisher (movMF) distributions, and (ii) latent Dirichlet allocation (LDA) on a subset of all ASRS reports. The models achieve a reasonably high performance in discovering anomaly categories and clustering reports. Each category is represented by the most representative words with the highest probability in this category. In addition, since the inference algorithm for LDA was somewhat slow, we have developed a new fast LDA algorithm which is 5-10 times more efficient than the original one, therefore more applicable for the practical use. Further, we have developed a simple visualization tool based on non-linear manifold embedding (ISOMAP) to generate a 2-d visual representation of each report based on its content/topics, which gives a direct view of the structure of the whole dataset as well as the outliers.

  3. d

    Anomaly Detection for Complex Systems

    • catalog.data.gov
    • s.cnmilf.com
    • +3more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Anomaly Detection for Complex Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-for-complex-systems
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    In performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.

  4. e

    Data for: Active Learning for Anomaly Detection in Environmental data -...

    • opendata.eawag.ch
    Updated Jan 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Data for: Active Learning for Anomaly Detection in Environmental data - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/active_learning
    Explore at:
    Dataset updated
    Jan 15, 2021
    Description

    This package contains the data and code necessary to run the Active Learning experiments for Anomaly detection. The dataset used for this study is a timeseries data in high spatiotemporal resolution from a long term ecological experiment ("NUtrients, DREissena mussels, and Macrophytes - NUDREM")

  5. c

    Comparative Analysis of Data-Driven Anomaly Detection Methods

    • s.cnmilf.com
    • data.nasa.gov
    • +2more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Comparative Analysis of Data-Driven Anomaly Detection Methods [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/comparative-analysis-of-data-driven-anomaly-detection-methods
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.

  6. i

    MIMIC dataset for Anomaly detection

    • ieee-dataport.org
    Updated Jan 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prarthi Jain (2021). MIMIC dataset for Anomaly detection [Dataset]. https://ieee-dataport.org/documents/mimic-dataset-anomaly-detection
    Explore at:
    Dataset updated
    Jan 31, 2021
    Authors
    Prarthi Jain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is part of the MIMIC database and specifically utilise the data corresponding to two patients with ids 221 and 230.

  7. q

    SAIVT-Campus Dataset

    • researchdatafinder.qut.edu.au
    Updated Jun 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Simon Denman (2016). SAIVT-Campus Dataset [Dataset]. https://researchdatafinder.qut.edu.au/individual/n2531
    Explore at:
    Dataset updated
    Jun 30, 2016
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Dr Simon Denman
    Description

    SAIVT-Campus Dataset

    Overview

    The SAIVT-Campus Database is an abnormal event detection database captured on a university campus, where the abnormal events are caused by the onset of a storm. Contact Dr Simon Denman or Dr Jingxin Xu for more information.

    Licensing

    The SAIVT-Campus database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.

    Attribution

    To attribute this database, please include the following citation: Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. available at eprints.

    Acknowledging the Database in your Publications

    In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications: We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-Campus database for our research.

    Installing the SAIVT-Campus database

    After downloading and unpacking the archive, you should have the following structure:

    SAIVT-Campus +-- LICENCE.txt +-- README.txt +-- test_dataset.avi +-- training_dataset.avi +-- Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf

    Notes

    The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.

    It contains two video files from real-world surveillance footage without any actors:

    training_dataset.avi (the training dataset)
    test_dataset.avi (the test dataset).
    

    This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:

    Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. Available at eprints. 
    This paper is also included with the database (Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf) Both video files are one hour in duration.
    

    The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.

    As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:

    the training dataset does not have abnormal scenes
    the test dataset separates into two parts: only normal activities occur from 00:00:00 to 00:47:16 abnormalities are present from 00:47:17 to 01:00:00. We annotate the time 00:47:17 as the start time for the abnormal events, as from this time on we have begun to observe people stop walking or turn back from walking towards the door to exit, which indicates that the rain outside the building has influenced the activities inside the building. Should you have any questions, please do not hesitate to contact Dr Jingxin Xu.
    
  8. i

    Unified Spacecraft Anomaly Detection Benchmark Dataset

    • ieee-dataport.org
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankit Srivastava (2024). Unified Spacecraft Anomaly Detection Benchmark Dataset [Dataset]. https://ieee-dataport.org/documents/unified-spacecraft-anomaly-detection-benchmark-dataset
    Explore at:
    Dataset updated
    Mar 30, 2024
    Authors
    Ankit Srivastava
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    finance

  9. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2025 - 2029
    Area covered
    Canada, United States, Germany, United Kingdom
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

    The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
    Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
    Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial technoogyneural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.
    

    How is this Anomaly Detection Industry segmented?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

    Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These soluti

  10. d

    Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  11. Anomaly Detection in Oil and Gas Chemical Plants

    • kaggle.com
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Anomaly Detection in Oil and Gas Chemical Plants [Dataset]. https://www.kaggle.com/datasets/programmer3/anomaly-detection-in-oil-and-gas-chemical-plants
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2025
    Dataset provided by
    Kaggle
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains sensor data collected from an oil and gas chemical plant, designed for anomaly detection. It includes time-series data, with key operational parameters such as temperature, pressure, flow rate, vibration levels, valve position, motor speed, and chemical concentration. The data is labeled with anomaly indicators, where 0 represents normal operational conditions and 1 represents an anomaly or abnormal event.

  12. Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An...

    • zenodo.org
    bin, csv, html
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy (2025). Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset" [Dataset]. http://doi.org/10.5281/zenodo.14062900
    Explore at:
    bin, html, csvAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a large-scale anomaly detection dataset collected from IBM Cloud's Console over approximately 4.5 months. This high-dimensional dataset captures telemetry data from multiple data centers, specifically designed to aid researchers in developing and benchmarking anomaly detection methods in large-scale cloud environments. It contains 39,365 entries, each representing a 5-minute interval, with 117,448 features/attributes, as interval_start is used as the index. The dataset includes detailed information on request counts, HTTP response codes, and various aggregated statistics. The dataset also includes labeled anomaly events identified through IBM's internal monitoring tools, providing a comprehensive resource for real-world anomaly detection research and evaluation.

    File Descriptions

    • location_downtime.csv - Details planned and unplanned downtimes for IBM Cloud data centers, including start and end times in ISO 8601 format.
    • unpivoted_data.parquet - Contains raw telemetry data with 413 million+ rows, covering details like location, HTTP status codes, request types, and aggregated statistics (min, max, median response times).
    • anomaly_windows.csv - Ground truth for anomalies, listing start and end times of recorded anomalies, categorized by source (Issue Tracker, Instant Messenger, Test Log).
    • pivoted_data_all.parquet - Pivoted version of the telemetry dataset with 39,365 rows and 117,449 columns, including aggregated statistics across multiple metrics and intervals.
    • demo/demo.[ipynb|html]: This demo file provides examples of how to access data in the Parquet files, available in Jupyter Notebook (.ipynb) and HTML (.html) formats, respectively.

    Further details of the dataset can be found in Appendix B: Dataset Characteristics of the paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset." Sample code for training anomaly detectors using this data is provided in this package.

    When using the dataset, please cite it as follows:

    @misc{islam2024anomaly,
    title={Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset},
    author={Mohammad Saiful Islam and Mohamed Sami Rakha and William Pourmajidi and Janakan Sivaloganathan and John Steinbacher and Andriy Miranskyy},
    year={2024},
    eprint={2411.09047},
    archivePrefix={arXiv},
    url={https://arxiv.org/abs/2411.09047}
    }

  13. e

    Data for: The Value of Human Data Annotation for Machine Learning based...

    • opendata.eawag.ch
    Updated Apr 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Data for: The Value of Human Data Annotation for Machine Learning based Anomaly Detection in Environmental Systems - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/data-for-the-value-of-human-expert-labelling-for-anomaly-detection-inenvironmental-systems
    Explore at:
    Dataset updated
    Apr 1, 2022
    Description

    This package contains the data and code necessary to run the experiments for our paper "The Value of Human Data Annotation for Machine Learning1based Anomaly Detection in Environmental Systems".

  14. Z

    Data set for anomaly detection on a HPC system

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Borghesi (2023). Data set for anomaly detection on a HPC system [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3251872
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Andrea Borghesi
    Francesco Beneventi
    Andrea Bartolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

    The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

    This work is described in:

    1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

    2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

    See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC

  15. z

    OPSSAT-AD - anomaly detection dataset for satellite telemetry

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruszczak Bogdan; Ruszczak Bogdan (2024). OPSSAT-AD - anomaly detection dataset for satellite telemetry [Dataset]. http://doi.org/10.5281/zenodo.12588359
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Ruszczak
    Authors
    Ruszczak Bogdan; Ruszczak Bogdan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT---a CubeSat mission that has been operated by the European Space Agency.

    It is accompanied by the paper with baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics that should always be calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible, and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.

    The two included files are:

    • segments.csv with the acquired telemetry signals from ESA OPS-SAT aircraft,
    • dataset.csv with the extracted, synthetic features are computed for each manually split and labeled telemetry segment.

    Please have a look at our two papers commenting on this dataset:

    • The benchmark paper with results of 30 supervised and unsupervised anomaly detection models for this collection:
      Ruszczak, B., Kotowski. K., Nalepa, J., Evans, D.: The OPS-SAT benchmark for detecting anomalies in satellite telemetry, 2024, preprint arxiv: 2407.04730,
    • the conference paper in which we presented some preliminary results for this dataset:
      Ruszczak, B., Kotowski. K., Andrzejewski, J., et al.: (2023). Machine Learning Detects Anomalies in OPS-SAT Telemetry. Computational Science – ICCS 2023. LNCS, vol 14073. Springer, Cham, DOI:10.1007/978-3-031-35995-8_21.
  16. Bank Transaction Dataset for Fraud Detection

    • kaggle.com
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vala khorasani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

    Key Features:

    • TransactionID: Unique alphanumeric identifier for each transaction.
    • AccountID: Unique identifier for each account, with multiple transactions per account.
    • TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.
    • TransactionDate: Timestamp of each transaction, capturing date and time.
    • TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.
    • Location: Geographic location of the transaction, represented by U.S. city names.
    • DeviceID: Alphanumeric identifier for devices used to perform the transaction.
    • IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.
    • MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.
    • AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.
    • PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.
    • Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).
    • CustomerAge: Age of the account holder, with logical groupings based on occupation.
    • CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.
    • TransactionDuration: Duration of the transaction in seconds, varying by transaction type.
    • LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

    This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.

  17. r

    KMASH Data Repository for outlier detection

    • research-repository.rmit.edu.au
    • researchdata.edu.au
    • +1more
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman (2023). KMASH Data Repository for outlier detection [Dataset]. http://doi.org/10.26180/5c6253c0b3323
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    RMIT University
    Authors
    Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The zip files contains 12338 datasets for outlier detection investigated in the following papers:(1) Instance space analysis for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles (2) On normalization and algorithm selection for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-MilesSome of these datasets were originally discussed in the paper: On the evaluation of unsupervised outlier detection:measures, datasets and an empirical studyAuthors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.

  18. R

    Data Streams Anomaly Detection Dataset

    • universe.roboflow.com
    zip
    Updated Sep 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kris (2024). Data Streams Anomaly Detection Dataset [Dataset]. https://universe.roboflow.com/kris-cv99j/data-streams-anomaly-detection/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 14, 2024
    Dataset authored and provided by
    kris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Anomaly Bounding Boxes
    Description

    Data Streams Anomaly Detection

    ## Overview
    
    Data Streams Anomaly Detection is a dataset for object detection tasks - it contains Anomaly annotations for 319 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. f

    Anomaly Detection in High-Dimensional Data

    • tandf.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles (2023). Anomaly Detection in High-Dimensional Data [Dataset]. http://doi.org/10.6084/m9.figshare.12844508.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online.

  20. ComplexVAD Video Anomaly Detection Dataset

    • zenodo.org
    zip
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz (2024). ComplexVAD Video Anomaly Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.11475281
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Introduction

    The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.

    The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.

    Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.

    At a Glance

    • The size of the unzipped dataset is ~39GB
    • The dataset consists of Train sequences (containing only videos with normal activity), Test sequences (containing some anomalous activity), a ground truth annotation file for each Test sequence, and a README.md file describing the data organization and ground truth annotation format.
    • The zip files contain a Train directory, a Test directory, an annotations directory, and a README.md file.

    License

    The ComplexVAD dataset is released under CC-BY-SA-4.0 license.

    All data:

    Created by Mitsubishi Electric Research Laboratories (MERL), 2024
    
    SPDX-License-Identifier: CC-BY-SA-4.0
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-a-fleet-of-systems

Data from: Anomaly Detection in a Fleet of Systems

Related Article
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.

Search
Clear search
Close search
Google apps
Main menu