27 datasets found
  1. ESA Anomaly Dataset

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp (2024). ESA Anomaly Dataset [Dataset]. http://doi.org/10.5281/zenodo.12528696
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    European Space Agencyhttp://www.esa.int/
    Authors
    Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jun 25, 2024
    Description

    ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.

    The dataset results from the work of an 18-month project carried by an industry Consortium composed of Airbus Defence and Space, KP Labs and the European Space Agency’s European Space Operations Centre. The project, funded by the European Space Agency (ESA), is part of the Artificial Intelligence for Automation (A²I) Roadmap (De Canio et al., 2023), a large endeavour started in 2021 to automate space operations by leveraging artificial intelligence.

    Further details can be found on the arXiv and Github.

    References
    De Canio, G. et al. (2023) Development of an actionable AI roadmap for automating mission operations. In, 2023 SpaceOps Conference. American Institute of Aeronautics and Astronautics, Dubai, United Arab Emirates.

  2. Numenta Anomaly Benchmark (NAB)

    • kaggle.com
    Updated Aug 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BoltzmannBrain (2016). Numenta Anomaly Benchmark (NAB) [Dataset]. https://www.kaggle.com/datasets/boltzmannbrain/nab/discussion?sortBy=hot&group=upvoted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2016
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BoltzmannBrain
    Description

    The Numenta Anomaly Benchmark (NAB) is a novel benchmark for evaluating algorithms for anomaly detection in streaming, online applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. All of the data and code is fully open-source, with extensive documentation, and a scoreboard of anomaly detection algorithms: github.com/numenta/NAB. The full dataset is included here, but please go to the repo for details on how to evaluate anomaly detection algorithms on NAB.

    NAB Data Corpus

    The NAB corpus of 58 timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both real-world and artifical timeseries data containing labeled anomalous periods of behavior. Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted.

    The majority of the data is real-world from a variety of sources such as AWS server metrics, Twitter volume, advertisement clicking metrics, traffic data, and more. All data is included in the repository, with more details in the data readme. We are in the process of adding more data, and actively searching for more data. Please contact us at nab@numenta.org if you have similar data (ideally with known anomalies) that you would like to see incorporated into NAB.

    The NAB version will be updated whenever new data (and corresponding labels) is added to the corpus; NAB is currently in v1.0.

    Real data

    • realAWSCloudwatch/

      AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes.

    • realAdExchange/

      Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM). One of the files is normal, without anomalies.

    • realKnownCause/

      This is data for which we know the anomaly causes; no hand labeling.

      • ambient_temperature_system_failure.csv: The ambient temperature in an office setting.
      • cpu_utilization_asg_misconfiguration.csv: From Amazon Web Services (AWS) monitoring CPU usage – i.e. average CPU usage across a given cluster. When usage is high, AWS spins up a new machine, and uses fewer machines when usage is low.
      • ec2_request_latency_system_failure.csv: CPU usage data from a server in Amazon's East Coast datacenter. The dataset ends with complete system failure resulting from a documented failure of AWS API servers. There's an interesting story behind this data in the "http://numenta.com/blog/anomaly-of-the-week.html">Numenta blog.
      • machine_temperature_system_failure.csv: Temperature sensor data of an internal component of a large, industrial mahcine. The first anomaly is a planned shutdown of the machine. The second anomaly is difficult to detect and directly led to the third anomaly, a catastrophic failure of the machine.
      • nyc_taxi.csv: Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the NYC Taxi and Limousine Commission. The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.
      • rogue_agent_key_hold.csv: Timing the key holds for several users of a computer, where the anomalies represent a change in the user.
      • rogue_agent_key_updown.csv: Timing the key strokes for several users of a computer, where the anomalies represent a change in the user.
    • realTraffic/

      Real time traffic data from the Twin Cities Metro area in Minnesota, collected by the Minnesota Department of Transportation. Included metrics include occupancy, speed, and travel time from specific sensors.

    • realTweets/

      A collection of Twitter mentions of large publicly-traded companies such as Google and IBM. The metric value represents the number of mentions for a given ticker symbol every 5 minutes.

    Artificial data

    • artificialNoAnomaly/

      Artifically-generated data without any anomalies.

    • artificialWithAnomaly/

      Artifically-generated data with varying types of anomalies.

    Acknowledgments

    We encourage you to publish your results on running NAB, and share them with us at nab@numenta.org. Please cite the following publication when referring to NAB:

    Lavin, Alexander and Ahmad, Subutai. "Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark", Fourteenth International Conference on Machine Learning and Applications, December 2015. [PDF]

  3. ToyADMOS dataset

    • zenodo.org
    • opendatalab.com
    • +1more
    bin, pdf
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuma Koizumi; Yuma Koizumi; Shoichiro Saito; Noboru Harada; Hisashi Uematsu; Keisuke Imoto; Keisuke Imoto; Shoichiro Saito; Noboru Harada; Hisashi Uematsu (2024). ToyADMOS dataset [Dataset]. http://doi.org/10.5281/zenodo.3351307
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yuma Koizumi; Yuma Koizumi; Shoichiro Saito; Noboru Harada; Hisashi Uematsu; Keisuke Imoto; Keisuke Imoto; Shoichiro Saito; Noboru Harada; Hisashi Uematsu
    Description

    ToyADMOS dataset is a machine operating sounds dataset of approximately 540 hours of normal machine operating sounds and over 12,000 samples of anomalous sounds collected with four microphones at a 48kHz sampling rate, prepared by Yuma Koizumi and members in NTT Media Intelligence Laboratories. The dataset consists of three sub-dataset: "toy car" for product inspection task, "toy conveyor" for fault diagnosis for fixed machine task, and "toy train" for fault diagnosis for moving machine task.

    Since the total size of the ToyADMOS dataset is over 440GB, each sub-dataset is split into 7-9 files by 7-zip (7z-format). The total size of the compressed dataset is approximately 180GB, and that of each sub-dataset is approximately 60GB. Download the zip files corresponding to sub-datasets of interest and use your favorite compression tool to unzip these split zip files.

    The detail of the dataset is described in [1] and GitHub: https://github.com/YumaKoizumi/ToyADMOS-dataset

    License: see the file named LICENSE.pdf

    [1] Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc of Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019.

  4. Data from: Multi-Source Distributed System Data for AI-powered Analytics

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao (2022). Multi-Source Distributed System Data for AI-powered Analytics [Dataset]. http://doi.org/10.5281/zenodo.3549604
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems.
    The major contributions have been materialized in the form of novel algorithms.
    Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms.
    Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
    Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research.
    Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.

    General Information:

    This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.

    You may find details of this dataset from the original paper:

    Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics".

    If you use the data, implementation, or any details of the paper, please cite!

    BIBTEX:

    _

    @inproceedings{nedelkoski2020multi,
     title={Multi-source Distributed System Data for AI-Powered Analytics},
     author={Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay Kumar and Becker, Soeren and Cardoso, Jorge and Kao, Odej},
     booktitle={European Conference on Service-Oriented and Cloud Computing},
     pages={161--176},
     year={2020},
     organization={Springer}
    }
    

    _

    The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth. We provide two datasets, which differ on how the workload is executed. The sequential_data is generated via executing workload of sequential user requests. The concurrent_data is generated via executing workload of concurrent user requests.

    The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.

    Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods. Please read the IMPORTANT_experiment_start_end.txt file before working with the data.

    Our GitHub repository with the code for the workloads and scripts for basic analysis can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/

  5. Z

    The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

    • data.niaid.nih.gov
    • autovi.utc.fr
    • +1more
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grandvalet, Yves (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8099579
    Explore at:
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Durupt, Alexandre
    Lafou, Meriem
    Leblanc, Antoine
    Carvalho, Philippe
    Grandvalet, Yves
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    See the official website: https://autovi.utc.fr

    Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

    On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

    Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

    The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

    The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

    DisclaimerAll defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

    LicenseCopyright © 2023-2024 Renault Group

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

    For using the data in a way that falls under the commercial use clause of the license, please contact us.

    AttributionPlease use the following for citing the dataset in scientific work:

    Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

    ContactIf you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

    Changelog

    v1.0.0

    Cropped engine_wiring, pipe_clip and pipe_staple images

    Reduced tank_screw, underbody_pipes and underbody_screw image sizes

    v0.1.1

    Added ground truth segmentation maps

    Fixed categorization of some images

    Added new defect categories

    Removed tube_fastening and kitting_cart

    Removed duplicates in pipe_clip

  6. Z

    Squirrel-Cage Induction Motor Fault Diagnosis Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateusz Piechocki (2023). Squirrel-Cage Induction Motor Fault Diagnosis Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8203069
    Explore at:
    Dataset updated
    Aug 3, 2023
    Dataset authored and provided by
    Mateusz Piechocki
    Description

    The Squirrel Cage Induction Motor Fault Diagnosis Dataset is a multi-sensor data collection gathered to expand research on anomaly detection, fault diagnosis, and predictive maintenance, mainly using non-invasive methods such as thermal observation or vibration measurement. The measurements were gathered using an advanced Wrocław University of Science and Technology laboratory designed to simulate and study motor defects. The collected dataset is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

    Available data:

    thermal images

    The example dataset utilization is presented in the GitHub repository: motor-fault-diagnosis

    Related publications:

    Unraveling Induction Motor State through Thermal Imaging and Edge Processing: A Step towards Explainable Fault Diagnosis

  7. Data from: MIMII Dataset: Sound Dataset for Malfunctioning Industrial...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Feb 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido; Kaori Suefusa; Kaori Suefusa; Yohei Kawaguchi; Yohei Kawaguchi; Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido (2020). MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection [Dataset]. http://doi.org/10.5281/zenodo.3384388
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 29, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido; Kaori Suefusa; Kaori Suefusa; Yohei Kawaguchi; Yohei Kawaguchi; Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). It contains the sounds generated from four types of industrial machines, i.e. valves, pumps, fans, and slide rails. Each type of machine includes seven individual product models*1, and the data for each model contains normal sounds (from 5000 seconds to 10000 seconds) and anomalous sounds (about 1000 seconds). To resemble a real-life scenario, various anomalous sounds were recorded (e.g., contamination, leakage, rotating unbalance, and rail damage). Also, the background noise recorded in multiple real factories was mixed with the machine sounds. The sounds were recorded by eight-channel microphone array with 16 kHz sampling rate and 16 bit per sample. The MIMII dataset assists benchmark for sound-based machine fault diagnosis. Users can test the performance for specific functions e.g., unsupervised anomaly detection, transfer learning, noise robustness, etc. The detail of the dataset is described in [1][2].

    This dataset is made available by Hitachi, Ltd. under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

    A baseline sample code for anomaly detection is available on GitHub: https://github.com/MIMII-hitachi/mimii_baseline/

    *1: This version "public 1.0" contains four models (model ID 00, 02, 04, and 06). The rest three models will be released in a future edition.

    [1] Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” arXiv preprint arXiv:1909.09347, 2019.

    [2] Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.

  8. d

    pyhydroqc Sensor Data QC: Single Site Example

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2023). pyhydroqc Sensor Data QC: Single Site Example [Dataset]. http://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Amber Spackman Jones
    Time period covered
    Jan 1, 2017 - Dec 31, 2017
    Description

    This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.

    This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

    Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction. - Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable. - Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables. - Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest. - Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.

    The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.

    The anomaly detection and correction workflow involves the following steps: 1. Retrieving data 2. Applying rules-based detection to screen data and apply initial corrections 3. Identifying and correcting sensor drift and calibration (if applicable) 4. Developing a model (i.e., ARIMA or LSTM) 5. Applying model to make time series predictions 6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results 7. Widening the window over which an anomaly is identified 8. Aggregating detections resulting from multiple models 9. Making corrections for anomalous events

    Instructions to run the notebook through the CUAHSI JupyterHub: 1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials. 2. Select 'Python 3.8 - Scientific' as the server and click Start. 2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file. 3. Execute each cell in the code by clicking the Run button.

  9. Z

    Mudestreda Multimodal Device State Recognition Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Truchan, Hubert (2024). Mudestreda Multimodal Device State Recognition Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8238652
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Truchan, Hubert
    Admadi, Zahra
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Description

    Mudestreda Multimodal Device State Recognition Dataset

    obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.

    The official dataset used in the paper "Multimodal Isotropic Neural Architecture with Patch Embedding" ICONIP23.

    Official repository: https://github.com/hubtru/Minape

    Conference paper: https://link.springer.com/chapter/10.1007/978-981-99-8079-6_14

    Mudestreda (MD) | Size 512 Samples (Instances, Observations)| Modalities 4 | Classes 3 |

    Future research: Regression, Remaining Useful Life (RUL) estimation, Signal Drift detection, Anomaly Detection, Multivariate Time Series Prediction, and Feature Engineering.

    Notice: Tables and images do not render properly.

    Recommended: README.md includes the Mudestreda description and images Mudestreda.png and Mudestreda_Stage.png.

    Data Overview

    Task: Uni/Multi-Modal Classification

    Domain: Industrial Flank Tool Wear of the Milling Machine

    Input (sample): 4 Images: 1 Tool Image, 3 Spectrograms (X, Y, Z axis)

    Output: Machine state classes: Sharp, Used, Dulled

    Evaluation: Accuracies, Precision, Recal, F1-score, ROC curve

    Each tool's wear is categorized sequentially: Sharp → Used → Dulled.

    The dataset includes measurements from ten tools: T1 to T10.

    Data splitting options include random or chronological distribution, without shuffling.

    Options:

    Original data or Augmented data

    Random distribution or Tool Distribution (see Dataset Splitting)

  10. MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation...

    • zenodo.org
    zip
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kota Dohi; Kota Dohi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido (2022). MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation for Domain Generalization Task [Dataset]. http://doi.org/10.5281/zenodo.6529888
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 11, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kota Dohi; Kota Dohi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task (MIMII DG). The dataset consists of normal and abnormal operating sounds of five different types of industrial machines, i.e., fans, gearboxes, bearing, slide rails, and valves. The data for each machine type includes three subsets called "sections", and each section roughly corresponds to a type of domain shift. This dataset is a subset of the dataset for DCASE 2022 Challenge Task 2, so the dataset is entirely the same as data included in the development dataset. For more information, please see the pages of the development dataset and the task description for DCASE 2022 Challenge Task 2.

    Baseline system

    Two simple baseline systems are available on the Github repositories autoencoder-based baseline and MobileNetV2-based baseline. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Conditions of use

    This dataset was made by Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Citation

    We will publish a paper on the dataset and will announce the citation information for them, so please make sure to cite them if you use this dataset.

    Feedback

    If there is any problem, pease contact us

  11. i

    Bitcoin Transaction Network Metadata (2011-2013)

    • ieee-dataport.org
    Updated Nov 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omer Shafiq (2019). Bitcoin Transaction Network Metadata (2011-2013) [Dataset]. http://doi.org/10.21227/d6dx-m651
    Explore at:
    Dataset updated
    Nov 24, 2019
    Dataset provided by
    IEEE Dataport
    Authors
    Omer Shafiq
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information:This dataset was created for research on blockchain anomaly and fraud detection. And donated to IEEE data port online community. https://github.com/epicprojects/blockchain-anomaly-detection

  12. Z

    ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for...

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niizumi, Daisuke (2024). ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4580269
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Harada, Noboru
    Niizumi, Daisuke
    Ohishi, Yasunori
    Saito, Shoichiro
    Yasuda, Masahiro
    Takeuchi, Daiki
    Description

    ToyADMOS2 dataset is a large-scale dataset for anomaly detection in machine operating sounds (ADMOS), designed for evaluating systems under domain-shift conditions. It consists of two sub-datasets for machine-condition inspection: fault diagnosis of machines with geometrically fixed tasks ("toy car") and fault diagnosis of machines with moving tasks ("toy train"). Domain shifts are represented by introducing several differences in operating conditions, such as the use of the same machine type but with different machine models and part configurations, different operating speeds, microphone arrangements, etc. Each sub-dataset contains over 27 k samples of normal machine-operating sounds and over 8 k samples of anomalous sounds recorded at a 48-kHz sampling rate. A subset of the ToyADMOS2 dataset was used in the DCASE 2021 challenge task 2: Unsupervised anomalous sound detection for machine condition monitoring under domain shifted conditions.

    What makes this dataset different from others is that it is not used as is, but in conjunction with the tool provided on GitHub. The mixer tool lets you create datasets with any combination of recordings by describing the amount you need in a recipe file.

    The samples are compressed as MPEG-4 ALS (MPEG-4 Audio Lossless Coding) with a suffix of '.mp4' that you can load by using the audioread or librosa python module.

    The total size of files under a folder ToyADMOS2 is 149 GB, and the total size of example benchmark datasets that are created from the ToyADMOS2 dataset is 13.2 GB.

    The detail of the dataset is described in [1] and GitHub: https://github.com/nttcslab/ToyADMOS2-dataset

    License: see LICENSE.pdf for the detail of the license.

    [1] Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito, "ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions," 2021. https://arxiv.org/abs/2106.02369

  13. H

    Supporting data and tools for "Toward automating post processing of aquatic...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones; Tanner Jones; Jeffery S. Horsburgh (2022). Supporting data and tools for "Toward automating post processing of aquatic sensor data" [Dataset]. http://doi.org/10.4211/hs.a6ea89ae20354e39b3c9f1228997e27a
    Explore at:
    zip(1.7 GB)Available download formats
    Dataset updated
    Mar 7, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones; Tanner Jones; Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2013 - Dec 31, 2019
    Area covered
    Description

    This resource contains the supporting data and code files for the analyses presented in "Toward automating post processing of aquatic sensor data," an article published in the journal Environmental Modelling and Software. This paper describes pyhydroqc, a Python package developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information on pyhydroqc, see the code repository (https://github.com/AmberSJones/pyhydroqc) and the documentation (https://ambersjones.github.io/pyhydroqc/). The package may be installed from the Python Package Index (more info: https://packaging.python.org/tutorials/installing-packages/).

    Included in this resource are input data, Python scripts to run the package on the input data (anomaly detection and correction), results from running the algorithm, and Python scripts for generating the figures in the manuscript. The organization and structure of the files are described in detail in the readme file. The input data were collected as part of the Logan River Observatory (LRO). The data in this resource represent a subset of data available for the LRO and were compiled by querying the LRO’s operational database. All available data for the LRO can be sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

    There are two sets of scripts in this resource: 1.) Scripts that reproduce plots for the paper using saved results, and 2.) Code used to generate the complete results for the series in the case study. While all figures can be reproduced, there are challenges to running the code for the complete results (it is computationally intensive, different results will be generated due to the stochastic nature of the models, and the code was developed with an early version of the package), which is why the saved results are included in this resource. For a simple example of running pyhydroqc functions for anomaly detection and correction on a subset of data, see this resource: https://www.hydroshare.org/resource/92f393cbd06b47c398bdd2bbb86887ac/.

  14. Z

    DCASE 2020 Challenge Task 2 Evaluation Dataset

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated May 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toshiki Nakamura (2022). DCASE 2020 Challenge Task 2 Evaluation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3841771
    Explore at:
    Dataset updated
    May 24, 2022
    Dataset provided by
    Noboru Harada
    Ryo Tanabe
    Yuma Koizumi
    Yohei Kawaguchi
    Keisuke Imoto
    Toshiki Nakamura
    Kaori Suefusa
    Yuki Nikaido
    Masahito Yasuda
    Harsh Purohit
    Takashi Endo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    This dataset is the "evaluation dataset" for the DCASE 2020 Challenge Task 2 "Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring" [task description].

    In the task, three datasets have been released: "development dataset", "additional training dataset", and "evaluation dataset". This evaluation dataset was the last of the three released. This dataset includes around 400 samples for each Machine Type and Machine ID used in the evaluation dataset, none of which have a condition label (i.e., normal or anomaly).

    The recording procedure and data format are the same as the development dataset and additional training dataset. The Machine IDs in this dataset are the same as those in the additional training dataset. For more information, please see the pages of the development dataset and the task description.

    After the DCASE 2020 Challenge, we released the ground truth for this evaluation dataset.

    Directory structure

    Once you unzip the downloaded files from Zenodo, you can see the following directory structure. Machine Type information is given by directory name, and Machine ID and condition information are given by file name, as:

    /eval_data

    /ToyCar

    /test (Normal and anomaly data for all Machine IDs are included, but they do not have a condition label.)

    /id_05_00000000.wav

    ...

    /id_05_00000514.wav

    /id_06_00000000.wav

    ...

    /id_07_00000514.wav

    /ToyConveyor (The other Machine Types have the same directory structure as ToyCar.)

    /fan

    /pump

    /slider

    /valve

    The paths of audio files are:

    "/eval_data/

    For example, the Machine Type and Machine ID of "/ToyCar/test/id_05_00000000.wav" are "ToyCar" and "05", respectively. Unlike the development dataset and additional training dataset, its condition label is hidden.

    Baseline system

    A simple baseline system is available on the Github repository [URL]. The baseline system provides a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. It is a good starting point, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Conditions of use

    This dataset was created jointly by NTT Corporation and Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Publication

    If you use this dataset, please cite all the following three papers:

    Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu, and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019. [pdf]

    Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019. [pdf]

    Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, and Noboru Harada, "Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring," in Proc. 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [pdf]

    Feedback

    If there is any problem, please contact us:

    Yuma Koizumi, koizumi.yuma@ieee.org

    Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com

    Keisuke Imoto, keisuke.imoto@ieee.org

  15. Data set for anomaly detection on a HPC system

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi (2023). Data set for anomaly detection on a HPC system [Dataset]. http://doi.org/10.5281/zenodo.3251873
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

    The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

    This work is described in:

    1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

    2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

    See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC

  16. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +1more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schubert, Erich (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Schubert, Erich
    Zimek, Arthur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
  17. Bitcoin Hacked Transactions 2010-2013 - Dataset - CryptoData Hub

    • cryptodata.center
    Updated Dec 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Bitcoin Hacked Transactions 2010-2013 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/bitcoin-hacked-transactions-2010-2013
    Explore at:
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created for research on blockchain anomaly and fraud detection. And donated to IEEE data port online community.https://github.com/epicprojects/blockchain-anomaly-detection Files: bitcoin_hacks_2010_2013.csv: Contains known hashes of bitcoin theft/malicious transactions from 2010-2013malicious_tx_in.csv: Contains hashes of input transactions flowing into malicious transactions.malicious_tx_out.csv Contains hashes of output transactions flowing out of malicious transactions. anomalies_theft_tx.csv: Known bitcoin thefts transaction hashes.anomalies_loss_tx.csv: Known bitcoin losses transaction hashes.anomalies_misc_tx.csv: Known bitcoin hacks transaction hashes.anomalies_seizure1_tx.csv: Known bitcoin transaction hashes involved in 1st FBI silk road sizure. (https://en.wikipedia.org/wiki/Silk_Road_(marketplace))anomalies_seizure2_tx.csv: Known bitcoin transaction hashes involved in 2nd FBI silk road sizure. (https://en.wikipedia.org/wiki/Silk_Road_(marketplace))

  18. Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms

    • zenodo.org
    • data.niaid.nih.gov
    bin, png
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Garske; Samuel Garske; Yiwei Mao; Yiwei Mao (2024). Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms [Dataset]. http://doi.org/10.5281/zenodo.13370800
    Explore at:
    bin, pngAvailable download formats
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuel Garske; Samuel Garske; Yiwei Mao; Yiwei Mao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.

    They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD

    How to Get Started

    All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:

    import numpy as np
    
    # Load image file
    hsi_array = np.load("data/beach_hsi.npy")
    n_pixels, n_lines, n_bands = hsi_array.shape
    print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.")
    
    # Load image mask
    mask_array = np.load("data/beach_mask.npy")
    m_pixels, m_lines = mask_array.shape
    print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")

    Citing the Datasets

    If you use any of these datasets, please cite the following paper:

    @article{garske2024erx,
    title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning},
    author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC},
    journal={arXiv preprint arXiv:2408.14947},
    year={2024},
    }
    If you use the beach dataset please cite the following paper as well (original source):
    @article{mao2022openhsi,
     title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone},
     author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy},
     journal={Remote Sensing},
     volume={14},
     number={9},
     pages={2244},
     year={2022},
     publisher={MDPI}
    }
  19. Z

    Dataset Artifact for Prodigy: Towards Unsupervised Anomaly Detection in...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schwaller, Benjamin (2023). Dataset Artifact for Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8079387
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Egele, Manuel
    Aksar, Burak
    Kulis, Brian
    Leung, Vitus
    Coskun, Ayse
    Sencan, Efe
    Aaziz, Omar
    Brandt, Jim
    Schwaller, Benjamin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a small set of application runs from Eclipse supercomputer. The applications run with and without synthetic HPC performance anomalies. More detailed information regarding synthetic anomalies can be found at: https://github.com/peaclab/HPAS.

    We have chosen four applications, namely LAMMPS, sw4, sw4Lite, and ExaMiniMD, to encompass both real and proxy applications. We have executed each application five times on four compute nodes without introducing any anomalies. To showcase our experiment, we have specifically selected the "memleak" anomaly as it is one of the most commonly occurring types. Additionally, we have also executed each application five times with the chosen anomaly. The dataset we have collected consists of a total of 160 samples, with 80 samples labeled as anomalous and 80 samples labeled as healthy. For the details of applications please refer to the paper.

    The applications were run on Eclipse, which is situated at Sandia National Laboratories. Eclipse comprises 1488 compute nodes, each equipped with 128GB of memory and two sockets. Each socket contains 18 E5-2695 v4 CPU cores with 2-way hyperthreading, providing substantial computational power for scientific and engineering applications.

  20. Z

    DCASE2021 UAD-S UMAP Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plumbley, Mark D. (2021). DCASE2021 UAD-S UMAP Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5123023
    Explore at:
    Dataset updated
    Aug 23, 2021
    Dataset provided by
    Fernandez Rodriguez, Andres
    Plumbley, Mark D.
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    Support data for our paper:

    USING UMAP TO INSPECT AUDIO DATA FOR UNSUPERVISED ANOMALY DETECTION UNDER DOMAIN-SHIFT CONDITIONS

    ArXiv preprint can be found here. Code for the experiment software pipeline described in the paper can be found here. The pipeline requires and generates different forms of data. Here we provide the following:

    AudioSet_wav_fragments.zip: This is a custom selection of 39437 wav files (32kHz, mono, 10 seconds) randomly extracted from AudioSet (originally released under CC-BY). In addition to this custom subset, the paper also uses the following ones, which can be downloaded at their respective websites:

    DCASE2021 Task 2 Development Dataset

    DCASE2021 Task 2 Additional Training Dataset

    Fraunhofer's IDMT-ISA-ELECTRIC-ENGINE Dataset

    dcase2021_uads_umaps.zip: To compute the UMAPs, first the log-STFT, log-mel and L3 representations must be extracted, and then the UMAPs must be computed. This can take a substantial amount of time and resources. For convenience, we provide here the 72 UMAPs discussed in the paper.

    dcase2021_uads_umap_plots.zip: Also for convenience, we provide here the 198 high-resolution scatter plots rendered from the UMAPs.

    For a comprehensive visual inspection of the computed representations, it is sufficient to download the plots only. Users interested in exploring the plots interactively will need to download all the audio datasets and compute the log-STFT, log-mel and L3 representations as well as the UMAPs themselves (code provided in the GitHub repository). UMAPs for further representations can also be computed and plotted.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp (2024). ESA Anomaly Dataset [Dataset]. http://doi.org/10.5281/zenodo.12528696
Organization logo

ESA Anomaly Dataset

Explore at:
44 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jun 28, 2024
Dataset provided by
European Space Agencyhttp://www.esa.int/
Authors
Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp
License

Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically

Time period covered
Jun 25, 2024
Description

ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.

The dataset results from the work of an 18-month project carried by an industry Consortium composed of Airbus Defence and Space, KP Labs and the European Space Agency’s European Space Operations Centre. The project, funded by the European Space Agency (ESA), is part of the Artificial Intelligence for Automation (A²I) Roadmap (De Canio et al., 2023), a large endeavour started in 2021 to automate space operations by leveraging artificial intelligence.

Further details can be found on the arXiv and Github.

References
De Canio, G. et al. (2023) Development of an actionable AI roadmap for automating mission operations. In, 2023 SpaceOps Conference. American Institute of Aeronautics and Astronautics, Dubai, United Arab Emirates.

Search
Clear search
Close search
Google apps
Main menu