100+ datasets found
  1. d

    Comparison of Unsupervised Anomaly Detection Methods

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

  2. Comparison of Unsupervised Anomaly Detection Methods - Dataset - NASA Open...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Comparison of Unsupervised Anomaly Detection Methods - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

  3. d

    Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  4. Z

    Controlled Anomalies Time Series (CATS) Dataset

    • data.niaid.nih.gov
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7646896
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

    4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

    3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

    10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

    5 million timestamps. Sensors readings are at 1Hz sampling frequency.

    1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

    4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

    200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

    Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.

    Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

    Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.

    Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).

    Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

    Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

    Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

    No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    Change Log

    Version 2

    Metadata: we include a metadata.csv with information about:

    Anomaly categories

    Root cause channel (signal in which the anomaly is first visible)

    Affected channel (signal in which the anomaly might propagate) through coupled system dynamics

    Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.

    Two data files: CSV and parquet for convenience.

    [1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  5. Synthetic Cybersecurity Logs for Anomaly Detection

    • kaggle.com
    zip
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fcWebDev (2024). Synthetic Cybersecurity Logs for Anomaly Detection [Dataset]. https://www.kaggle.com/datasets/fcwebdev/synthetic-cybersecurity-logs-for-anomaly-detection
    Explore at:
    zip(160070 bytes)Available download formats
    Dataset updated
    Dec 16, 2024
    Authors
    fcWebDev
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains synthetic HTTP log data designed for cybersecurity analysis, particularly for anomaly detection tasks.

    Dataset Features Timestamp: Simulated time for each log entry. IP_Address: Randomized IP addresses to simulate network traffic. Request_Type: Common HTTP methods (GET, POST, PUT, DELETE). Status_Code: HTTP response status codes (e.g., 200, 404, 403, 500). Anomaly_Flag: Binary flag indicating anomalies (1 = anomaly, 0 = normal). User_Agent: Simulated user agents for device and browser identification. Session_ID: Random session IDs to simulate user activity. Location: Geographic locations of requests. Applications This dataset can be used for:

    Anomaly Detection: Identify suspicious network activity or attacks. Machine Learning: Train models for classification tasks (e.g., detect anomalies). Cybersecurity Analysis: Analyze HTTP traffic patterns and identify threats. Example Challenge Build a machine learning model to predict the Anomaly_Flag based on the features provided.

  6. Financial Anomaly Data

    • kaggle.com
    zip
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mukesh Kumar Choudhary (2023). Financial Anomaly Data [Dataset]. https://www.kaggle.com/datasets/devondev/financial-anomaly-data
    Explore at:
    zip(3063503 bytes)Available download formats
    Dataset updated
    Dec 17, 2023
    Authors
    Mukesh Kumar Choudhary
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    With this dataset, you could perform various analyses like: Detecting anomalies in transaction amounts (e.g., unusually high transactions). Identifying irregular transaction types for specific accounts. Recognizing unusual patterns based on transaction timestamps or locations. Tracking spending behaviors based on merchants.

    This Data set contains following columns:

    Timestamp: This column records the date and time when the transaction occurred. It helps in understanding the temporal aspect of transactions, such as patterns over time, frequency, and clustering of activities.

    TransactionID: An identification number assigned to each transaction. It serves as a unique identifier for referencing or tracking specific transactions.

    AccountID: This field represents the unique identifier associated with the bank account involved in the transaction. It links multiple transactions to a specific account, enabling analysis on a per-account basis.

    Amount: The monetary value involved in the transaction. This column provides information about the financial magnitude of each transaction, which is crucial for anomaly detection since unusually high or low values might signify irregularities.

    Merchant: Specifies the entity or business involved in the transaction. This information helps in categorizing transactions (e.g., retail, online, restaurant) and identifying patterns related to specific merchants.

    TransactionType: Describes the nature or category of the transaction, whether it's a withdrawal, deposit, transfer, payment, etc. This column helps in understanding the purpose or direction of the transaction.

    Location: Indicates the place where the transaction occurred. It could be a physical location (e.g., city, country) or an identifier (e.g., store code, online portal), aiding in analyzing geographical spending patterns or detecting anomalies based on unusual transaction locations.

  7. d

    Data from: Anomaly Detection in a Fleet of Systems

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Anomaly Detection in a Fleet of Systems [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-a-fleet-of-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.

  8. Satellite telemetry data anomaly prediction

    • kaggle.com
    zip
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orvile (2025). Satellite telemetry data anomaly prediction [Dataset]. https://www.kaggle.com/datasets/orvile/satellite-telemetry-data-anomaly-prediction
    Explore at:
    zip(2084669 bytes)Available download formats
    Dataset updated
    Apr 17, 2025
    Authors
    Orvile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OPSSAT-AD - anomaly detection dataset for satellite telemetry

    This is the AI-ready benchmark dataset (OPSSAT-AD) containing the telemetry data acquired on board OPS-SAT---a CubeSat mission that has been operated by the European Space Agency.

    It is accompanied by the paper with baseline results obtained using 30 supervised and unsupervised classic and deep machine learning algorithms for anomaly detection. They were trained and validated using the training-test dataset split introduced in this work, and we present a suggested set of quality metrics that should always be calculated to confront the new algorithms for anomaly detection while exploiting OPSSAT-AD. We believe that this work may become an important step toward building a fair, reproducible, and objective validation procedure that can be used to quantify the capabilities of the emerging anomaly detection techniques in an unbiased and fully transparent way.

    The included files are:

    segments.csv with the acquired telemetry signals from ESA OPS-SAT aircraft,
    dataset.csv with the extracted, synthetic features are computed for each manually split and labeled telemetry segment.
    code files for data processing and example modeliing (dataset_generator.ipynb for data processing, modeling_examples.ipynb with simple examples, requirements.txt- with details on Python configuration, and the LICENSE file)
    

    Citation Bogdan, R. (2024). OPSSAT-AD - anomaly detection dataset for satellite telemetry [Data set]. Ruszczak. https://doi.org/10.5281/zenodo.15108715

  9. Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  10. d

    Labelled evaluation datasets of AIS Trajectories from Danish Waters for...

    • data.dtu.dk
    bin
    Updated Jul 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Labelled evaluation datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511815.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

    DOI: https://doi.org/10.11583/DTU.c.6287841

    Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour. The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
    We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

    These datasets consists of labelled trajectories for the purpose of evaluating unsupervised models for detection of abnormal maritime behavior. For unlabelled datasets for training please refer to the collection. Link in Related publications.

    The dataset is an example of a SAR event and cannot not be considered representative of a large population of all SAR events.

    The dataset consists of a total of 521 trajectories of which 25 is labelled as abnormal. the data is captured on a single day in a specific region. The remaining normal traffic is representative of the traffic during the winter season. The normal traffic in the ROI has a fairly high seasonality related to fishing and leisure sailing traffic.

    The data is saved using the pickle format for Python. Each dataset is split into 2 files with naming convention:

    datasetInfo_XXX
    data_XXX

    Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

    The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

    Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

    See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.

  11. Anomaly Detection in a Fleet of Systems - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Anomaly Detection in a Fleet of Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/anomaly-detection-in-a-fleet-of-systems
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.

  12. Z

    Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garske, Samuel; Mao, Yiwei (2024). Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13370799
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    University of Sydney
    Authors
    Garske, Samuel; Mao, Yiwei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.

    They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD

    How to Get Started

    All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:

    import numpy as np

    Load image file

    hsi_array = np.load("data/beach_hsi.npy") n_pixels, n_lines, n_bands = hsi_array.shape print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.")

    Load image mask

    mask_array = np.load("data/beach_mask.npy") m_pixels, m_lines = mask_array.shape print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")

    Citing the Datasets

    If you use any of these datasets, please cite the following paper:

    @article{garske2024erx, title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning}, author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC}, journal={arXiv preprint arXiv:2408.14947}, year={2024},}

    If you use the beach dataset please cite the following paper as well (original source):

    @article{mao2022openhsi, title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone}, author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy}, journal={Remote Sensing}, volume={14}, number={9}, pages={2244}, year={2022}, publisher={MDPI} }

  13. d

    pyhydroqc Sensor Data QC: Single Site Example

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2023). pyhydroqc Sensor Data QC: Single Site Example [Dataset]. http://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Amber Spackman Jones
    Time period covered
    Jan 1, 2017 - Dec 31, 2017
    Description

    This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.

    This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

    Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction. - Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable. - Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables. - Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest. - Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.

    The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.

    The anomaly detection and correction workflow involves the following steps: 1. Retrieving data 2. Applying rules-based detection to screen data and apply initial corrections 3. Identifying and correcting sensor drift and calibration (if applicable) 4. Developing a model (i.e., ARIMA or LSTM) 5. Applying model to make time series predictions 6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results 7. Widening the window over which an anomaly is identified 8. Aggregating detections resulting from multiple models 9. Making corrections for anomalous events

    Instructions to run the notebook through the CUAHSI JupyterHub: 1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials. 2. Select 'Python 3.8 - Scientific' as the server and click Start. 2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file. 3. Execute each cell in the code by clicking the Run button.

  14. d

    MKAD (Open Sourced Code)

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). MKAD (Open Sourced Code) [Dataset]. https://catalog.data.gov/dataset/mkad-open-sourced-code
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Area covered
    MKAD
    Description

    The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.

  15. Smartphone Dataset for Anomaly Detection in Crowds

    • kaggle.com
    zip
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). Smartphone Dataset for Anomaly Detection in Crowds [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/smartphone-dataset-for-anomaly-detection-in-crowds
    Explore at:
    zip(271703 bytes)Available download formats
    Dataset updated
    Apr 24, 2024
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was collected from the Smartphone sensors and can be used to analyse behaviour of a crowd, for example, an anomaly.

    Dataset Characteristics: Time-Series

    Subject Area: Computer Science

    Associated Tasks: Classification

    Instances: 14221

    Dataset Information

    For what purpose was the dataset created?

    The key purpose of donating this dataset is to provide an opportunity to the research community to use it for further research purposes.

    Who funded the creation of the dataset? Muhammad Irfan

    What do the instances in this dataset represent? One instance represents a movement patter for a group based activity.

    Are there recommended data splits? No.

    Has Missing Values? No

    Introductory Paper

    Title: Anomaly Detection in Crowds using Multi Sensory Information

    Author:M. Irfan, L. Marcenaro, and L. Tokarchuk, C. Regazzoni. 2018

    Journal: Published in 5th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), Auckland, New Zealand,

    Link: https://ieeexplore.ieee.org/document/8639151

    Abstract of Introductory Paper

    This paper presents, a system capable of detecting unusual activities in crowds from real-world data captured from multiple sensors. The detection is achieved by classifying the distinct movements of people in crowds, and those patterns can be different and can be classified as normal and abnormal activities. Statistical features are extracted from the dataset collected by applying sliding time window operations. A model for classifying movements is trained by using Random Forest technique. The system was tested by using two datasets collected from mobile phones during social events gathering. Results show that mobile data can be used to detect anomalies in crowds as an alternative to video sensors with significant performances. Our approach is the first to detect any unusual behavior in crowd with non-visual data, which is simple to train and easy to deploy. We also present our dataset for public research as there is no such dataset available to perform experiments on crowds for detecting unusual behaviours.

    Cite

    Citation: Irfan,Muhammad. (2021). Smartphone Dataset for Anomaly Detection in Crowds. UCI Machine Learning Repository. https://doi.org/10.24432/C5Q90H.

    BibTeX: @misc{misc_smartphone_dataset_for_anomaly_detection_in_crowds_613, author = {Irfan,Muhammad}, title = {{Smartphone Dataset for Anomaly Detection in Crowds}}, year = {2021}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C5Q90H} }

  16. Data set for anomaly detection on a HPC system

    • zenodo.org
    bin
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi (2023). Data set for anomaly detection on a HPC system [Dataset]. http://doi.org/10.5281/zenodo.3251873
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

    The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

    This work is described in:

    1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

    2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

    See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC

  17. d

    Updated gravity stations and anomaly compilation over Lake Superior

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Updated gravity stations and anomaly compilation over Lake Superior [Dataset]. https://catalog.data.gov/dataset/updated-gravity-stations-and-anomaly-compilation-over-lake-superior
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Lake Superior
    Description

    New gravity compilation has been compiled for the Lake Superior region. The gravity compilation includes survey stations available from Natural Resources Canada, National Centers for Environmental Information (formerly National Geophysical Data Center), Minnesota Geological Survey, and U.S. Geological Survey. Individual databases were combined and duplicates were removed resulting in a database of 63,880 gravity stations. The gravity station data were reprocessed from observed gravity to simple Bouguer anomaly following standard methods that depend on the station type (for example, land, lake surface, or lake bottom observation) and used a reduction density of 2,670 kg/m3. The compilation provides a consistent dataset appropriate for gravity modeling that extends across Lake Superior shores.

  18. f

    Data from: Detection of Emergent Anomalous Structure in Functional Data

    • tandf.figshare.com
    zip
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edward Austin; Idris A. Eckley; Lawrence Bardwell (2024). Detection of Emergent Anomalous Structure in Functional Data [Dataset]. http://doi.org/10.6084/m9.figshare.25618940.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 14, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Edward Austin; Idris A. Eckley; Lawrence Bardwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivated by an example arising from digital networks, we propose a novel approach for detecting the emergence of anomalies in functional data. In contrast to classical functional data approaches, which detect anomalies in completely observed curves, the proposed approach seeks to identify anomalies sequentially as each point on the curve is received. The new method, the Functional Anomaly Sequential Test (FAST), captures the common profile of the curves using Principal Differential Analysis and uses a form of CUSUM test to monitor a new functional observation as it emerges. Various theoretical properties of the procedure are derived. The performance of FAST is then assessed on both simulated and telecommunications data.

  19. r

    Deep one-class learning: a deep learning approach to anomaly detection

    • resodate.org
    Updated Oct 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Ruff (2021). Deep one-class learning: a deep learning approach to anomaly detection [Dataset]. http://doi.org/10.14279/depositonce-12250
    Explore at:
    Dataset updated
    Oct 8, 2021
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Lukas Ruff
    Description

    Anomaly detection is the problem of identifying unusual patterns in data. This problem is relevant for a wide variety of applications in various domains such as fault and damage detection in manufacturing, fraud detection in finance and insurance, intrusion detection in cybersecurity, disease detection in medical diagnosis, or scientific discovery. Many of these applications involve increasingly complex data at large scale, for instance, large collections of images or text. The lack of effective solutions in such settings has sparked an interest in developing anomaly detection methods based on deep learning, which has enabled breakthroughs in other machine learning problems that involve large amounts of complex data. This thesis proposes Deep One-Class Learning, a deep learning approach to anomaly detection that is based on the one-class classification paradigm. One-class classification views anomaly detection from a classification perspective, aiming to learn a discriminative decision boundary that separates the normal from the anomalous data. In contrast to previous methods that rely on fixed (usually manually engineered) features, deep one-class learning expands the one-class classification approach with methods that learn (or transfer) data representations via suitable one-class learning objectives. The key idea underlying deep one-class learning is to learn a transformation (e.g., a deep neural network) in such a way that the normal data points are concentrated in feature space, causing anomalies to deviate from the concentrated region, thereby making them detectable. We introduce several deep one-class learning methods in this thesis that follow the above idea while integrating different assumptions about the data or a specific domain. These include semi-supervised variants that can incorporate labeled anomalies, for example, or specific methods for images and text that enable model interpretability and an explanation of anomalies. Moreover, we present a unifying view of anomaly detection methods that, in addition to one-class classification, also covers reconstruction methods as well as methods based on density estimation and generative modeling. For each of these main approaches, we identify connections between respective deep and "shallow" methods based on common underlying principles. Through multiple experiments and analyses, we demonstrate that deep one-class learning is useful for anomaly detection, especially on semantic detection tasks. Finally, we conclude this thesis by discussing limits of the proposed approach and outlining specific paths for future research.

  20. MKAD (Open Sourced Code) - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MKAD (Open Sourced Code) - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/mkad-open-sourced-code
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    MKAD
    Description

    The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods

Comparison of Unsupervised Anomaly Detection Methods

Explore at:
18 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

Search
Clear search
Close search
Google apps
Main menu