27 datasets found

ESA Anomaly Dataset
zenodo.org
explore.openaire.eu
zip
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp (2024). ESA Anomaly Dataset [Dataset]. http://doi.org/10.5281/zenodo.12528696
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12528696
Dataset updated
Jun 28, 2024
Dataset provided by
European Space Agencyhttp://www.esa.int/
Authors
Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Jun 25, 2024
Description
ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.

The dataset results from the work of an 18-month project carried by an industry Consortium composed of Airbus Defence and Space, KP Labs and the European Space Agency’s European Space Operations Centre. The project, funded by the European Space Agency (ESA), is part of the Artificial Intelligence for Automation (A²I) Roadmap (De Canio et al., 2023), a large endeavour started in 2021 to automate space operations by leveraging artificial intelligence.

Further details can be found on the arXiv and Github.

References
De Canio, G. et al. (2023) Development of an actionable AI roadmap for automating mission operations. In, 2023 SpaceOps Conference. American Institute of Aeronautics and Astronautics, Dubai, United Arab Emirates.
Numenta Anomaly Benchmark (NAB)
kaggle.com
Updated Aug 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BoltzmannBrain (2016). Numenta Anomaly Benchmark (NAB) [Dataset]. https://www.kaggle.com/datasets/boltzmannbrain/nab/discussion?sortBy=hot&group=upvoted
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BoltzmannBrain
Description
The Numenta Anomaly Benchmark (NAB) is a novel benchmark for evaluating algorithms for anomaly detection in streaming, online applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. All of the data and code is fully open-source, with extensive documentation, and a scoreboard of anomaly detection algorithms: github.com/numenta/NAB. The full dataset is included here, but please go to the repo for details on how to evaluate anomaly detection algorithms on NAB.

NAB Data Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both real-world and artifical timeseries data containing labeled anomalous periods of behavior. Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted.

The majority of the data is real-world from a variety of sources such as AWS server metrics, Twitter volume, advertisement clicking metrics, traffic data, and more. All data is included in the repository, with more details in the data readme. We are in the process of adding more data, and actively searching for more data. Please contact us at nab@numenta.org if you have similar data (ideally with known anomalies) that you would like to see incorporated into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is added to the corpus; NAB is currently in v1.0.

Real data

realAWSCloudwatch/

AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes.

realAdExchange/

Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM). One of the files is normal, without anomalies.

realKnownCause/

This is data for which we know the anomaly causes; no hand labeling.

ambient_temperature_system_failure.csv: The ambient temperature in an office setting.

cpu_utilization_asg_misconfiguration.csv: From Amazon Web Services (AWS) monitoring CPU usage – i.e. average CPU usage across a given cluster. When usage is high, AWS spins up a new machine, and uses fewer machines when usage is low.

ec2_request_latency_system_failure.csv: CPU usage data from a server in Amazon's East Coast datacenter. The dataset ends with complete system failure resulting from a documented failure of AWS API servers. There's an interesting story behind this data in the "http://numenta.com/blog/anomaly-of-the-week.html">Numenta blog.

machine_temperature_system_failure.csv: Temperature sensor data of an internal component of a large, industrial mahcine. The first anomaly is a planned shutdown of the machine. The second anomaly is difficult to detect and directly led to the third anomaly, a catastrophic failure of the machine.

nyc_taxi.csv: Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the NYC Taxi and Limousine Commission. The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.

rogue_agent_key_hold.csv: Timing the key holds for several users of a computer, where the anomalies represent a change in the user.

rogue_agent_key_updown.csv: Timing the key strokes for several users of a computer, where the anomalies represent a change in the user.

realTraffic/

Real time traffic data from the Twin Cities Metro area in Minnesota, collected by the Minnesota Department of Transportation. Included metrics include occupancy, speed, and travel time from specific sensors.

realTweets/

A collection of Twitter mentions of large publicly-traded companies such as Google and IBM. The metric value represents the number of mentions for a given ticker symbol every 5 minutes.

Artificial data

artificialNoAnomaly/

Artifically-generated data without any anomalies.

artificialWithAnomaly/

Artifically-generated data with varying types of anomalies.

Acknowledgments

We encourage you to publish your results on running NAB, and share them with us at nab@numenta.org. Please cite the following publication when referring to NAB:

Lavin, Alexander and Ahmad, Subutai. "Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark", Fourteenth International Conference on Machine Learning and Applications, December 2015. [PDF]
ToyADMOS dataset
zenodo.org
opendatalab.com
+1more
bin, pdf
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuma Koizumi; Yuma Koizumi; Shoichiro Saito; Noboru Harada; Hisashi Uematsu; Keisuke Imoto; Keisuke Imoto; Shoichiro Saito; Noboru Harada; Hisashi Uematsu (2024). ToyADMOS dataset [Dataset]. http://doi.org/10.5281/zenodo.3351307
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3351307
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yuma Koizumi; Yuma Koizumi; Shoichiro Saito; Noboru Harada; Hisashi Uematsu; Keisuke Imoto; Keisuke Imoto; Shoichiro Saito; Noboru Harada; Hisashi Uematsu
Description
ToyADMOS dataset is a machine operating sounds dataset of approximately 540 hours of normal machine operating sounds and over 12,000 samples of anomalous sounds collected with four microphones at a 48kHz sampling rate, prepared by Yuma Koizumi and members in NTT Media Intelligence Laboratories. The dataset consists of three sub-dataset: "toy car" for product inspection task, "toy conveyor" for fault diagnosis for fixed machine task, and "toy train" for fault diagnosis for moving machine task.

Since the total size of the ToyADMOS dataset is over 440GB, each sub-dataset is split into 7-9 files by 7-zip (7z-format). The total size of the compressed dataset is approximately 180GB, and that of each sub-dataset is approximately 60GB. Download the zip files corresponding to sub-datasets of interest and use your favorite compression tool to unzip these split zip files.

The detail of the dataset is described in [1] and GitHub: https://github.com/YumaKoizumi/ToyADMOS-dataset

License: see the file named LICENSE.pdf

[1] Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc of Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019.
Data from: Multi-Source Distributed System Data for AI-powered Analytics
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao (2022). Multi-Source Distributed System Data for AI-powered Analytics [Dataset]. http://doi.org/10.5281/zenodo.3549604
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3549604
Dataset updated
Nov 10, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems.
The major contributions have been materialized in the form of novel algorithms.
Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms.
Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research.
Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.

General Information:

This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.

You may find details of this dataset from the original paper:

Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics".

If you use the data, implementation, or any details of the paper, please cite!

BIBTEX:

_

@inproceedings{nedelkoski2020multi, title={Multi-source Distributed System Data for AI-Powered Analytics}, author={Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay Kumar and Becker, Soeren and Cardoso, Jorge and Kao, Odej}, booktitle={European Conference on Service-Oriented and Cloud Computing}, pages={161--176}, year={2020}, organization={Springer} }

_

The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth. We provide two datasets, which differ on how the workload is executed. The sequential_data is generated via executing workload of sequential user requests. The concurrent_data is generated via executing workload of concurrent user requests.

The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.

Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods. Please read the IMPORTANT_experiment_start_end.txt file before working with the data.

Our GitHub repository with the code for the workloads and scripts for basic analysis can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/
Z
The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...
data.niaid.nih.gov
autovi.utc.fr
+1more
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grandvalet, Yves (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8099579
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
Durupt, Alexandre
Lafou, Meriem
Leblanc, Antoine
Carvalho, Philippe
Grandvalet, Yves
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
See the official website: https://autovi.utc.fr

Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

DisclaimerAll defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

LicenseCopyright © 2023-2024 Renault Group

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

For using the data in a way that falls under the commercial use clause of the license, please contact us.

AttributionPlease use the following for citing the dataset in scientific work:

Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

ContactIf you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

Changelog

v1.0.0

Cropped engine_wiring, pipe_clip and pipe_staple images

Reduced tank_screw, underbody_pipes and underbody_screw image sizes

v0.1.1

Added ground truth segmentation maps

Fixed categorization of some images

Added new defect categories

Removed tube_fastening and kitting_cart

Removed duplicates in pipe_clip
Z
Squirrel-Cage Induction Motor Fault Diagnosis Dataset
data.niaid.nih.gov
zenodo.org
Updated Aug 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mateusz Piechocki (2023). Squirrel-Cage Induction Motor Fault Diagnosis Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8203069
Explore at:
Dataset updated
Aug 3, 2023
Dataset authored and provided by
Mateusz Piechocki
Description
The Squirrel Cage Induction Motor Fault Diagnosis Dataset is a multi-sensor data collection gathered to expand research on anomaly detection, fault diagnosis, and predictive maintenance, mainly using non-invasive methods such as thermal observation or vibration measurement. The measurements were gathered using an advanced Wrocław University of Science and Technology laboratory designed to simulate and study motor defects. The collected dataset is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Available data:

thermal images

The example dataset utilization is presented in the GitHub repository: motor-fault-diagnosis

Related publications:

Unraveling Induction Motor State through Thermal Imaging and Edge Processing: A Step towards Explainable Fault Diagnosis
Data from: MIMII Dataset: Sound Dataset for Malfunctioning Industrial...
zenodo.org
explore.openaire.eu
zip
Updated Feb 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido; Kaori Suefusa; Kaori Suefusa; Yohei Kawaguchi; Yohei Kawaguchi; Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido (2020). MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection [Dataset]. http://doi.org/10.5281/zenodo.3384388
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3384388
Dataset updated
Feb 29, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido; Kaori Suefusa; Kaori Suefusa; Yohei Kawaguchi; Yohei Kawaguchi; Harsh Purohit; Ryo Tanabe; Kenji Ichige; Takashi Endo; Yuki Nikaido
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). It contains the sounds generated from four types of industrial machines, i.e. valves, pumps, fans, and slide rails. Each type of machine includes seven individual product models*1, and the data for each model contains normal sounds (from 5000 seconds to 10000 seconds) and anomalous sounds (about 1000 seconds). To resemble a real-life scenario, various anomalous sounds were recorded (e.g., contamination, leakage, rotating unbalance, and rail damage). Also, the background noise recorded in multiple real factories was mixed with the machine sounds. The sounds were recorded by eight-channel microphone array with 16 kHz sampling rate and 16 bit per sample. The MIMII dataset assists benchmark for sound-based machine fault diagnosis. Users can test the performance for specific functions e.g., unsupervised anomaly detection, transfer learning, noise robustness, etc. The detail of the dataset is described in [1][2].

This dataset is made available by Hitachi, Ltd. under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

A baseline sample code for anomaly detection is available on GitHub: https://github.com/MIMII-hitachi/mimii_baseline/

*1: This version "public 1.0" contains four models (model ID 00, 02, 04, and 06). The rest three models will be released in a future edition.

[1] Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” arXiv preprint arXiv:1909.09347, 2019.

[2] Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.
d
pyhydroqc Sensor Data QC: Single Site Example
search.dataone.org
hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones (2023). pyhydroqc Sensor Data QC: Single Site Example [Dataset]. http://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
Explore at:
Unique identifier
https://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones
Time period covered
Jan 1, 2017 - Dec 31, 2017
Description
This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.

This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction. - Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable. - Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables. - Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest. - Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.

The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.

The anomaly detection and correction workflow involves the following steps: 1. Retrieving data 2. Applying rules-based detection to screen data and apply initial corrections 3. Identifying and correcting sensor drift and calibration (if applicable) 4. Developing a model (i.e., ARIMA or LSTM) 5. Applying model to make time series predictions 6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results 7. Widening the window over which an anomaly is identified 8. Aggregating detections resulting from multiple models 9. Making corrections for anomalous events

Instructions to run the notebook through the CUAHSI JupyterHub: 1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials. 2. Select 'Python 3.8 - Scientific' as the server and click Start. 2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file. 3. Execute each cell in the code by clicking the Run button.
Z
Mudestreda Multimodal Device State Recognition Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Truchan, Hubert (2024). Mudestreda Multimodal Device State Recognition Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8238652
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Truchan, Hubert
Admadi, Zahra
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Description
Mudestreda Multimodal Device State Recognition Dataset

obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.

The official dataset used in the paper "Multimodal Isotropic Neural Architecture with Patch Embedding" ICONIP23.

Official repository: https://github.com/hubtru/Minape

Conference paper: https://link.springer.com/chapter/10.1007/978-981-99-8079-6_14

Mudestreda (MD) | Size 512 Samples (Instances, Observations)| Modalities 4 | Classes 3 |

Future research: Regression, Remaining Useful Life (RUL) estimation, Signal Drift detection, Anomaly Detection, Multivariate Time Series Prediction, and Feature Engineering.

Notice: Tables and images do not render properly.

Recommended: README.md includes the Mudestreda description and images Mudestreda.png and Mudestreda_Stage.png.

Data Overview

Task: Uni/Multi-Modal Classification

Domain: Industrial Flank Tool Wear of the Milling Machine

Input (sample): 4 Images: 1 Tool Image, 3 Spectrograms (X, Y, Z axis)

Output: Machine state classes: Sharp, Used, Dulled

Evaluation: Accuracies, Precision, Recal, F1-score, ROC curve

Each tool's wear is categorized sequentially: Sharp → Used → Dulled.

The dataset includes measurements from ten tools: T1 to T10.

Data splitting options include random or chronological distribution, without shuffling.

Options:

Original data or Augmented data

Random distribution or Tool Distribution (see Dataset Splitting)
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation...
zenodo.org
zip
Updated May 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kota Dohi; Kota Dohi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido (2022). MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation for Domain Generalization Task [Dataset]. http://doi.org/10.5281/zenodo.6529888
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6529888
Dataset updated
May 11, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kota Dohi; Kota Dohi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Ryo Tanabe; Takashi Endo; Masaaki Yamamoto; Yuki Nikaido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task (MIMII DG). The dataset consists of normal and abnormal operating sounds of five different types of industrial machines, i.e., fans, gearboxes, bearing, slide rails, and valves. The data for each machine type includes three subsets called "sections", and each section roughly corresponds to a type of domain shift. This dataset is a subset of the dataset for DCASE 2022 Challenge Task 2, so the dataset is entirely the same as data included in the development dataset. For more information, please see the pages of the development dataset and the task description for DCASE 2022 Challenge Task 2.

Baseline system

Two simple baseline systems are available on the Github repositories autoencoder-based baseline and MobileNetV2-based baseline. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Conditions of use

This dataset was made by Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

We will publish a paper on the dataset and will announce the citation information for them, so please make sure to cite them if you use this dataset.

Feedback

If there is any problem, pease contact us

Kota Dohi, kota.dohi.gr@hitachi.com

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
i
Bitcoin Transaction Network Metadata (2011-2013)
ieee-dataport.org
Updated Nov 24, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omer Shafiq (2019). Bitcoin Transaction Network Metadata (2011-2013) [Dataset]. http://doi.org/10.21227/d6dx-m651
Explore at:
Unique identifier
https://doi.org/10.21227/d6dx-m651
Dataset updated
Nov 24, 2019
Dataset provided by
IEEE Dataport
Authors
Omer Shafiq
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information:This dataset was created for research on blockchain anomaly and fraud detection. And donated to IEEE data port online community. https://github.com/epicprojects/blockchain-anomaly-detection
Z
ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for...
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niizumi, Daisuke (2024). ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4580269
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Harada, Noboru
Niizumi, Daisuke
Ohishi, Yasunori
Saito, Shoichiro
Yasuda, Masahiro
Takeuchi, Daiki
Description
ToyADMOS2 dataset is a large-scale dataset for anomaly detection in machine operating sounds (ADMOS), designed for evaluating systems under domain-shift conditions. It consists of two sub-datasets for machine-condition inspection: fault diagnosis of machines with geometrically fixed tasks ("toy car") and fault diagnosis of machines with moving tasks ("toy train"). Domain shifts are represented by introducing several differences in operating conditions, such as the use of the same machine type but with different machine models and part configurations, different operating speeds, microphone arrangements, etc. Each sub-dataset contains over 27 k samples of normal machine-operating sounds and over 8 k samples of anomalous sounds recorded at a 48-kHz sampling rate. A subset of the ToyADMOS2 dataset was used in the DCASE 2021 challenge task 2: Unsupervised anomalous sound detection for machine condition monitoring under domain shifted conditions.

What makes this dataset different from others is that it is not used as is, but in conjunction with the tool provided on GitHub. The mixer tool lets you create datasets with any combination of recordings by describing the amount you need in a recipe file.

The samples are compressed as MPEG-4 ALS (MPEG-4 Audio Lossless Coding) with a suffix of '.mp4' that you can load by using the audioread or librosa python module.

The total size of files under a folder ToyADMOS2 is 149 GB, and the total size of example benchmark datasets that are created from the ToyADMOS2 dataset is 13.2 GB.

The detail of the dataset is described in [1] and GitHub: https://github.com/nttcslab/ToyADMOS2-dataset

License: see LICENSE.pdf for the detail of the license.

[1] Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito, "ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions," 2021. https://arxiv.org/abs/2106.02369
H
Supporting data and tools for "Toward automating post processing of aquatic...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Tanner Jones; Jeffery S. Horsburgh (2022). Supporting data and tools for "Toward automating post processing of aquatic sensor data" [Dataset]. http://doi.org/10.4211/hs.a6ea89ae20354e39b3c9f1228997e27a
Explore at:
zip(1.7 GB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.a6ea89ae20354e39b3c9f1228997e27a
Dataset updated
Mar 7, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones; Tanner Jones; Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2013 - Dec 31, 2019
Area covered

Description
This resource contains the supporting data and code files for the analyses presented in "Toward automating post processing of aquatic sensor data," an article published in the journal Environmental Modelling and Software. This paper describes pyhydroqc, a Python package developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information on pyhydroqc, see the code repository (https://github.com/AmberSJones/pyhydroqc) and the documentation (https://ambersjones.github.io/pyhydroqc/). The package may be installed from the Python Package Index (more info: https://packaging.python.org/tutorials/installing-packages/).

Included in this resource are input data, Python scripts to run the package on the input data (anomaly detection and correction), results from running the algorithm, and Python scripts for generating the figures in the manuscript. The organization and structure of the files are described in detail in the readme file. The input data were collected as part of the Logan River Observatory (LRO). The data in this resource represent a subset of data available for the LRO and were compiled by querying the LRO’s operational database. All available data for the LRO can be sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

There are two sets of scripts in this resource: 1.) Scripts that reproduce plots for the paper using saved results, and 2.) Code used to generate the complete results for the series in the case study. While all figures can be reproduced, there are challenges to running the code for the complete results (it is computationally intensive, different results will be generated due to the stochastic nature of the models, and the code was developed with an early version of the package), which is why the saved results are included in this resource. For a simple example of running pyhydroqc functions for anomaly detection and correction on a subset of data, see this resource: https://www.hydroshare.org/resource/92f393cbd06b47c398bdd2bbb86887ac/.
Z
DCASE 2020 Challenge Task 2 Evaluation Dataset
data.niaid.nih.gov
zenodo.org
+1more
Updated May 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toshiki Nakamura (2022). DCASE 2020 Challenge Task 2 Evaluation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3841771
Explore at:
Dataset updated
May 24, 2022
Dataset provided by
Noboru Harada
Ryo Tanabe
Yuma Koizumi
Yohei Kawaguchi
Keisuke Imoto
Toshiki Nakamura
Kaori Suefusa
Yuki Nikaido
Masahito Yasuda
Harsh Purohit
Takashi Endo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

This dataset is the "evaluation dataset" for the DCASE 2020 Challenge Task 2 "Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring" [task description].

In the task, three datasets have been released: "development dataset", "additional training dataset", and "evaluation dataset". This evaluation dataset was the last of the three released. This dataset includes around 400 samples for each Machine Type and Machine ID used in the evaluation dataset, none of which have a condition label (i.e., normal or anomaly).

The recording procedure and data format are the same as the development dataset and additional training dataset. The Machine IDs in this dataset are the same as those in the additional training dataset. For more information, please see the pages of the development dataset and the task description.

After the DCASE 2020 Challenge, we released the ground truth for this evaluation dataset.

Directory structure

Once you unzip the downloaded files from Zenodo, you can see the following directory structure. Machine Type information is given by directory name, and Machine ID and condition information are given by file name, as:

/eval_data

/ToyCar

/test (Normal and anomaly data for all Machine IDs are included, but they do not have a condition label.)

/id_05_00000000.wav

...

/id_05_00000514.wav

/id_06_00000000.wav

...

/id_07_00000514.wav

/ToyConveyor (The other Machine Types have the same directory structure as ToyCar.)

/fan

/pump

/slider

/valve

The paths of audio files are:

"/eval_data/

For example, the Machine Type and Machine ID of "/ToyCar/test/id_05_00000000.wav" are "ToyCar" and "05", respectively. Unlike the development dataset and additional training dataset, its condition label is hidden.

Baseline system

A simple baseline system is available on the Github repository [URL]. The baseline system provides a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. It is a good starting point, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Conditions of use

This dataset was created jointly by NTT Corporation and Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Publication

If you use this dataset, please cite all the following three papers:

Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu, and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019. [pdf]

Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019. [pdf]

Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, and Noboru Harada, "Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring," in Proc. 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [pdf]

Feedback

If there is any problem, please contact us:

Yuma Koizumi, koizumi.yuma@ieee.org

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com

Keisuke Imoto, keisuke.imoto@ieee.org
Data set for anomaly detection on a HPC system
zenodo.org
data.niaid.nih.gov
bin
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi (2023). Data set for anomaly detection on a HPC system [Dataset]. http://doi.org/10.5281/zenodo.3251873
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3251873
Dataset updated
Apr 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrea Borghesi; Andrea Bartolini; Francesco Beneventi; Andrea Borghesi; Andrea Bartolini; Francesco Beneventi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

This work is described in:

1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC
Z
ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...
data.niaid.nih.gov
elki-project.github.io
+1more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schubert, Erich (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Schubert, Erich
Zimek, Arthur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type Description Files Object number Sparse 1000 dimensional vectors that give the true object assignment objs.arff.gz RGB color histograms Standard RGB color histograms (uniform binning) aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz HSV color histograms Standard HSV/HSB color histograms in various binnings aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz Color similiarity Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other) Haralick features First 13 Haralick features (radius 1 pixel) aloi-haralick-1.csv.gz Front to back Vectors representing front face vs. back faces of individual objects front.arff.gz Basic light Vectors indicating basic light situations light.arff.gz Manual annotations Manually annotated object groups of semantically related objects such as cups manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type Description Files RGB Histograms Downsampled to 100000 objects (553 outliers) aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz Downsampled to 75000 objects (717 outliers) aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz Downsampled to 50000 objects (1508 outliers) aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
Bitcoin Hacked Transactions 2010-2013 - Dataset - CryptoData Hub
cryptodata.center
Updated Dec 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Bitcoin Hacked Transactions 2010-2013 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/bitcoin-hacked-transactions-2010-2013
Explore at:
Dataset updated
Dec 5, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created for research on blockchain anomaly and fraud detection. And donated to IEEE data port online community.https://github.com/epicprojects/blockchain-anomaly-detection Files: bitcoin_hacks_2010_2013.csv: Contains known hashes of bitcoin theft/malicious transactions from 2010-2013malicious_tx_in.csv: Contains hashes of input transactions flowing into malicious transactions.malicious_tx_out.csv Contains hashes of output transactions flowing out of malicious transactions. anomalies_theft_tx.csv: Known bitcoin thefts transaction hashes.anomalies_loss_tx.csv: Known bitcoin losses transaction hashes.anomalies_misc_tx.csv: Known bitcoin hacks transaction hashes.anomalies_seizure1_tx.csv: Known bitcoin transaction hashes involved in 1st FBI silk road sizure. (https://en.wikipedia.org/wiki/Silk_Road_(marketplace))anomalies_seizure2_tx.csv: Known bitcoin transaction hashes involved in 2nd FBI silk road sizure. (https://en.wikipedia.org/wiki/Silk_Road_(marketplace))
Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms
zenodo.org
data.niaid.nih.gov
bin, png
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Garske; Samuel Garske; Yiwei Mao; Yiwei Mao (2024). Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms [Dataset]. http://doi.org/10.5281/zenodo.13370800
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13370800
Dataset updated
Aug 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuel Garske; Samuel Garske; Yiwei Mao; Yiwei Mao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.

They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD

How to Get Started

All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:

import numpy as np # Load image file hsi_array = np.load("data/beach_hsi.npy") n_pixels, n_lines, n_bands = hsi_array.shape print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.") # Load image mask mask_array = np.load("data/beach_mask.npy") m_pixels, m_lines = mask_array.shape print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")

Citing the Datasets

If you use any of these datasets, please cite the following paper:

@article{garske2024erx,
title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning},
author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC},
journal={arXiv preprint arXiv:2408.14947},
year={2024},
}

If you use the beach dataset please cite the following paper as well (original source):

@article{mao2022openhsi, title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone}, author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy}, journal={Remote Sensing}, volume={14}, number={9}, pages={2244}, year={2022}, publisher={MDPI} }
Z
Dataset Artifact for Prodigy: Towards Unsupervised Anomaly Detection in...
data.niaid.nih.gov
zenodo.org
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schwaller, Benjamin (2023). Dataset Artifact for Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8079387
Explore at:
Dataset updated
Nov 12, 2023
Dataset provided by
Egele, Manuel
Aksar, Burak
Kulis, Brian
Leung, Vitus
Coskun, Ayse
Sencan, Efe
Aaziz, Omar
Brandt, Jim
Schwaller, Benjamin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains a small set of application runs from Eclipse supercomputer. The applications run with and without synthetic HPC performance anomalies. More detailed information regarding synthetic anomalies can be found at: https://github.com/peaclab/HPAS.

We have chosen four applications, namely LAMMPS, sw4, sw4Lite, and ExaMiniMD, to encompass both real and proxy applications. We have executed each application five times on four compute nodes without introducing any anomalies. To showcase our experiment, we have specifically selected the "memleak" anomaly as it is one of the most commonly occurring types. Additionally, we have also executed each application five times with the chosen anomaly. The dataset we have collected consists of a total of 160 samples, with 80 samples labeled as anomalous and 80 samples labeled as healthy. For the details of applications please refer to the paper.

The applications were run on Eclipse, which is situated at Sandia National Laboratories. Eclipse comprises 1488 compute nodes, each equipped with 128GB of memory and two sockets. Each socket contains 18 E5-2695 v4 CPU cores with 2-way hyperthreading, providing substantial computational power for scientific and engineering applications.
Z
DCASE2021 UAD-S UMAP Data
data.niaid.nih.gov
zenodo.org
Updated Aug 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Plumbley, Mark D. (2021). DCASE2021 UAD-S UMAP Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5123023
Explore at:
Dataset updated
Aug 23, 2021
Dataset provided by
Fernandez Rodriguez, Andres
Plumbley, Mark D.
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Description
Support data for our paper:

USING UMAP TO INSPECT AUDIO DATA FOR UNSUPERVISED ANOMALY DETECTION UNDER DOMAIN-SHIFT CONDITIONS

ArXiv preprint can be found here. Code for the experiment software pipeline described in the paper can be found here. The pipeline requires and generates different forms of data. Here we provide the following:

AudioSet_wav_fragments.zip: This is a custom selection of 39437 wav files (32kHz, mono, 10 seconds) randomly extracted from AudioSet (originally released under CC-BY). In addition to this custom subset, the paper also uses the following ones, which can be downloaded at their respective websites:

DCASE2021 Task 2 Development Dataset

DCASE2021 Task 2 Additional Training Dataset

Fraunhofer's IDMT-ISA-ELECTRIC-ENGINE Dataset

dcase2021_uads_umaps.zip: To compute the UMAPs, first the log-STFT, log-mel and L3 representations must be extracted, and then the UMAPs must be computed. This can take a substantial amount of time and resources. For convenience, we provide here the 72 UMAPs discussed in the paper.

dcase2021_uads_umap_plots.zip: Also for convenience, we provide here the 198 high-resolution scatter plots rendered from the UMAPs.

For a comprehensive visual inspection of the computed representations, it is sufficient to download the plots only. Users interested in exploring the plots interactively will need to download all the audio datasets and compute the log-STFT, log-mel and L3 representations as well as the UMAPs themselves (code provided in the GitHub repository). UMAPs for further representations can also be computed and plotted.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp (2024). ESA Anomaly Dataset [Dataset]. http://doi.org/10.5281/zenodo.12528696

ESA Anomaly Dataset

Explore at:

44 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12528696

Dataset updated

Jun 28, 2024

Dataset provided by

European Space Agencyhttp://www.esa.int/

Authors

Gabriele De Canio; Gabriele De Canio; Krzysztof Kotowski; Christoph Haskamp; Krzysztof Kotowski; Christoph Haskamp

License

Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically

Time period covered

Jun 25, 2024

Description

ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.

The dataset results from the work of an 18-month project carried by an industry Consortium composed of Airbus Defence and Space, KP Labs and the European Space Agency’s European Space Operations Centre. The project, funded by the European Space Agency (ESA), is part of the Artificial Intelligence for Automation (A²I) Roadmap (De Canio et al., 2023), a large endeavour started in 2021 to automate space operations by leveraging artificial intelligence.

Further details can be found on the arXiv and Github.

References
De Canio, G. et al. (2023) Development of an actionable AI roadmap for automating mission operations. In, 2023 SpaceOps Conference. American Institute of Aeronautics and Astronautics, Dubai, United Arab Emirates.

Clear search

Close search

Google apps

Main menu

ESA Anomaly Dataset

Numenta Anomaly Benchmark (NAB)

NAB Data Corpus

Real data

Artificial data

Acknowledgments

ToyADMOS dataset

Data from: Multi-Source Distributed System Data for AI-powered Analytics

The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

Squirrel-Cage Induction Motor Fault Diagnosis Dataset

Data from: MIMII Dataset: Sound Dataset for Malfunctioning Industrial...

pyhydroqc Sensor Data QC: Single Site Example

Mudestreda Multimodal Device State Recognition Dataset

MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation...

Bitcoin Transaction Network Metadata (2011-2013)

ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for...

Supporting data and tools for "Toward automating post processing of aquatic...

DCASE 2020 Challenge Task 2 Evaluation Dataset

Data set for anomaly detection on a HPC system

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

Bitcoin Hacked Transactions 2010-2013 - Dataset - CryptoData Hub

Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms

Summary

How to Get Started

Citing the Datasets

Dataset Artifact for Prodigy: Towards Unsupervised Anomaly Detection in...

DCASE2021 UAD-S UMAP Data

ESA Anomaly DatasetSee More Versions

ESA Anomaly Dataset