86 datasets found
  1. d

    Comparison of Unsupervised Anomaly Detection Methods

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +2more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

  2. z

    Controlled Anomalies Time Series (CATS) Dataset

    • zenodo.org
    bin
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
      • 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
      • 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
    • 200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
    • Different types of anomalies to understand what anomaly types can be detected by different approaches.
    • Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
    • Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
    • Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    [1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  3. Comparison of Unsupervised Anomaly Detection Methods - Dataset - NASA Open...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Comparison of Unsupervised Anomaly Detection Methods - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

  4. Anomalous Action Detection Dataset

    • kaggle.com
    Updated Jun 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sayantan roy 10121999 (2024). Anomalous Action Detection Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8717699
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sayantan roy 10121999
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    ******### Ano-AAD Dataset: Comprehensive Anomalous Human Action Detection in Videos******

    The Ano-AAD dataset is a groundbreaking resource designed to advance the field of anomaly detection in video surveillance. Compiled from an extensive array of sources, including popular social media platforms and various websites, this dataset captures a wide range of human behaviors, both normal and anomalous. By providing a rich and diverse set of video data, the Ano-AAD dataset is poised to significantly enhance the capabilities of surveillance systems and contribute to the development of more sophisticated safety protocols.

    #### Inception and Objective

    The primary objective behind the creation of the Ano-AAD dataset was to address the pressing need for a comprehensive, well-annotated collection of video footage that can be used to train and evaluate models for detecting anomalous human actions. Recognizing the limitations of existing datasets, which often lack diversity and sufficient examples of real-world scenarios, we embarked on a meticulous process to gather, annotate, and validate a diverse array of videos. Our goal was to ensure that the dataset encompasses a wide variety of environments and actions, thereby providing a robust foundation for the development of advanced anomaly detection algorithms.

    #### Data Collection Process

    The data collection process for the Ano-AAD dataset was both extensive and methodical. We identified and selected videos from various social media platforms, such as Facebook and YouTube, as well as other online sources. These videos were chosen to represent a broad spectrum of real-world scenarios, including both typical daily activities and less frequent, but critical, anomalous events. Each video was carefully reviewed to ensure it met our criteria for relevance, clarity, and authenticity.

    #### Categorization and Annotation

    A cornerstone of the Ano-AAD dataset is its detailed categorization and annotation of human actions. Each video clip was meticulously labeled to differentiate between normal activities—such as walking, sitting, and working—and anomalous behaviors, which include arrests, burglaries, explosions, fighting, fire raising, ill treatment, traffic irregularities, attacks, and other violent acts. This comprehensive annotation process was essential to creating a dataset that accurately reflects the complexities of real-world surveillance challenges. Our team of annotators underwent rigorous training to ensure consistency and reliability in the labeling process, and multiple rounds of validation were conducted to maintain high-quality annotations.

    #### Ethical Considerations

    Throughout the data collection and annotation process, we adhered to strict ethical guidelines and privacy regulations. All videos were sourced from publicly available content, and efforts were made to anonymize individuals to protect their privacy. We prioritized compliance with data protection principles, ensuring that our work not only advanced technological capabilities but also respected the rights and privacy of individuals depicted in the footage.

    #### Technical Specifications

    The Ano-AAD dataset comprises a total of 354 abnormal videos, and 41 normal videos amounting to 8.7 GB of abnormal data and a cumulative abnormal video duration of 11 hours and 25 minutes. We also added 41 nomal videos of time duration of 41 miniutes. Each video was processed to maintain a uniform format and resolution, typically standardized nomal videos of time duration to MP4 . This consistency in video quality ensures that the dataset can be seamlessly integrated into various machine learning models and computer vision algorithms, facilitating the development and testing of anomaly detection systems.

    #### Dataset Breakdown

    | Serial Number | Anomaly Class | Total Number of Videos | Size | Duration (HH:MM) | |---------------|----------------------|------------------------|----------|------------------| | 1 | Arrest | 49 | 1.7 GB | 2:10 | | 2 | Burglary | 48 | 948.7 MB | 1:26 | | 3 | Explosion | 49 | 773 MB | 1:01 | | 4 | Fighting | 50 | 2.0 GB | 2:23 | | 5 | Fire Raising | 49 | 999.4 MB | 1:20 | | 6 | Ill Treatment | 32 | 812.5 MB | 1:07 | | 7 | Traffic Irregularities | 13 | 79.3 MB | 0:05 | | 8 | Attack | 38 | 543.8 MB | 0:41 | | 9 | Violence | 26 | 836 MB | 1:08 | | Total | ...

  5. w

    Advanced Ground Systems Maintenance Anomaly Detection

    • data.wu.ac.at
    xml
    Updated Sep 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2017). Advanced Ground Systems Maintenance Anomaly Detection [Dataset]. https://data.wu.ac.at/schema/data_gov/N2FlMjQ4OTEtYjU2Yy00NjI0LTlhNmItZjUyNzFmNzBiYTY5
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 16, 2017
    Dataset provided by
    National Aeronautics and Space Administration
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The Inductive Monitoring System (IMS) software utilizes techniques from the fields of model-based reasoning, machine learning, and data mining to build system monitoring knowledge bases from archived or simulated sensor data. Unlike some other machine learning techniques, IMS does not require examples of anomalous (failure) behavior. IMS automatically analyzes nominal system data to form general classes of expected system sensor values. This process enables the software to inductively learn and model nominal system behavior. The generated data classes are then used to build a monitoring knowledge base. In real-time,

    IMS performs monitoring functions, determining and displaying the degree of deviation from nominal performance. IMS trend analyses can detect conditions that may indicate a failure or required system maintenance. The development of the IMS was motivated by the difficulty of producing detailed diagnostic models of some system components due to complexity or unavailability of design information.

    This project will develop the capability to identify anomalous conditions (indications to potential impending system failure) in ground system operations before such failures occur. These indicators are not presently detectable by traditional command and control and fault detection systems. This project enables the delivery of system's health advisories to ground system operators so they can take action prior to experiencing systems failures. Inductive Monitoring System (IMS) detected anomalies can be sent to a diagnostic software module for diagnosis.

    Anomaly Detection provides the 21st Century Launch Complex Program with the ability to identify/recognize systems' anomalies before they become faults in the system; it supports the resolution of such anomalies to assure system availability and mission success. This capability also allows reduction in systems' maintenance costs by dictating when maintenance is needed (Maintenance-on-Demand) versus performing maintenance on schedule.

  6. Replication Package - How Industry Tackles Anomalies during Runtime:...

    • zenodo.org
    bin, pdf
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monika Steidl; Monika Steidl; Benedikt Dornauer; Benedikt Dornauer (2024). Replication Package - How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters [Dataset]. http://doi.org/10.5281/zenodo.12658560
    Explore at:
    pdf, binAvailable download formats
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Monika Steidl; Monika Steidl; Benedikt Dornauer; Benedikt Dornauer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 13, 2024
    Description

    This replication package includes general remarks on anomaly detection approaches identified via an extension of a literature study (Soldani, J., & Brogi, A. (2022). Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey. ACM Computing Surveys (CSUR), 55(3), 1-39.) and 15 interview participants from various domains to address the methodology and findings for anomalies, anomaly detection approaches and key monitoring parameters extracted from runtime monitoring data types (logs, traces, metrics) to detect anomalies.

    Due to confidentiality, we cannot provide the video recordings or transcripts.

    This replication package contains:

    • Interview_Guidelines.pdf: includes the pilot-tested interview questions split up into introduction, use case elaboration, and parameters that explain system behavior and questions. Furthermore, we include short summaries expressing our intention via the questions
    • Procedure_Interview_Participant_Selection.pdf: explains the applied purposive sampling selection strategy, the email used to contact industry interview partners and the demographic information collected from the interviews
    • RQ1_Inductive_Coding.pdf: summarises the interview participants' statements regarding the interpretations and characteristics of an anomaly and industry examples
    • RQ1_IEEE_Definitions_over_Years.pdf: summary of identified IEEE definitions regarding anomalies
    • RQ2_Inductive_Coding.pdf: summarises the interview participants' statements regarding rule-based and AI-based and advantages/disadvantages thereof
    • RQ3_Overview_Interviews_IndustryPaper.xlxs: summarises the interview participants' statements & anomaly detection tools of papers that evaluated their approach with industry datasets, regarding parameters suitable for detecting anomalies. Furthermore, it includes methodological information (such as inclusion/exclusion criteria, excluded papers, and the sample based dual blind review)
    • Overview_AllPapers.xlxs: summarises all identified literature studies (anomaly detection approaches that are evaluated via industry datasets and via benchmark datasets)
  7. D

    MVTec LOCO AD Dataset

    • datasetninja.com
    Updated Mar 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Bergmann; Kilian Batzner; Michael Fauser (2021). MVTec LOCO AD Dataset [Dataset]. https://datasetninja.com/mvtec-loco-ad
    Explore at:
    Dataset updated
    Mar 23, 2021
    Dataset provided by
    Dataset Ninja
    Authors
    Paul Bergmann; Kilian Batzner; Michael Fauser
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The authors of the MVTec LOCO AD: MVTec Logical Constraints Anomaly Detection dataset introduce a benchmark corpus aimed at addressing the unsupervised detection and localization of anomalies in natural images. This challenging problem involves the identification of anomalies that can manifest in diverse ways. The authors offer pixel-precise ground truth annotations for each anomalous region and introduce an evaluation metric tailored to address localization ambiguities that can arise with logical anomalies. To create an effective benchmark, the authors assert that a dataset should encompass representative examples of various anomaly types. They observe that existing datasets tend to focus on local structural anomalies, such as scratches or dents, while overlooking anomalies involving violations of logical constraints, such as objects appearing in invalid locations. To bridge this gap, the authors contribute a novel dataset based on industrial inspection contexts. This dataset is meticulously designed to provide a balanced representation of both structural and logical anomalies. Additionally, they present a new algorithm that outperforms existing approaches in jointly detecting structural and logical anomalies. This algorithm comprises local and global network branches, each with distinct responsibilities.

  8. d

    pyhydroqc Sensor Data QC: Single Site Example

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2023). pyhydroqc Sensor Data QC: Single Site Example [Dataset]. http://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Amber Spackman Jones
    Time period covered
    Jan 1, 2017 - Dec 31, 2017
    Description

    This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.

    This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

    Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction. - Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable. - Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables. - Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest. - Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.

    The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.

    The anomaly detection and correction workflow involves the following steps: 1. Retrieving data 2. Applying rules-based detection to screen data and apply initial corrections 3. Identifying and correcting sensor drift and calibration (if applicable) 4. Developing a model (i.e., ARIMA or LSTM) 5. Applying model to make time series predictions 6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results 7. Widening the window over which an anomaly is identified 8. Aggregating detections resulting from multiple models 9. Making corrections for anomalous events

    Instructions to run the notebook through the CUAHSI JupyterHub: 1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials. 2. Select 'Python 3.8 - Scientific' as the server and click Start. 2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file. 3. Execute each cell in the code by clicking the Run button.

  9. g

    anomaly fans

    • cetacean.gcoos.org
    • hub.arcgis.com
    • +3more
    Updated Feb 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GCOOS (2025). anomaly fans [Dataset]. https://cetacean.gcoos.org/datasets/gcoos::boem-seafloor-anomalies?layer=9
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    GCOOS
    Area covered
    Description

    Sand-rich turbidite fans which have intermittently dominated sedimentation in portions of the deep water Gulf of Mexico for millions of years and are the source of many of the subsurface sand reservoirs throughout the basin. There are a few large, discrete recent channel/fan complexes on the seafloor (especially in Alaminos Canyon) that have high-positive acoustic response on seismic data and are easily recognized on amplitude maps. These recent examples are good analogues of reservoir geometries for subsurface exploration/development activities.

  10. a

    Full Range Heat Anomalies - USA 2020

    • hrtc-oc-cerf.hub.arcgis.com
    Updated Mar 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Trust for Public Land (2023). Full Range Heat Anomalies - USA 2020 [Dataset]. https://hrtc-oc-cerf.hub.arcgis.com/datasets/TPL::full-range-heat-anomalies-usa-2020
    Explore at:
    Dataset updated
    Mar 4, 2023
    Dataset authored and provided by
    The Trust for Public Land
    Area covered
    Description

    Notice: this is not the latest Heat Island Anomalies image service. For 2023 data visit https://tpl.maps.arcgis.com/home/item.html?id=e89a556263e04cb9b0b4638253ca8d10.This layer contains the relative degrees Fahrenheit difference between any given pixel and the mean heat value for the city in which it is located, for every city in the United States. This 30-meter raster was derived from Landsat 8 imagery band 10 (ground-level thermal sensor) from the summers of 2019 and 2020.Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter or cooler than the average temperature for that same city as a whole. This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at The Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.In order to click on the image service and see the raw pixel values in a map viewer, you must be signed in to ArcGIS Online, then Enable Pop-Ups and Configure Pop-Ups.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms “tpl” and “uhi”. The results returned will be the UHI image services. Right click on a result and select “Add to current map” from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select “None”, then “OK”. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of ArizonaDr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAA Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so The Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). The Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.

  11. M

    Global Sea Level Anomalies

    • marine-analyst.org
    • marine-analyst.eu
    html
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMODnet Physics Sea Level (2025). Global Sea Level Anomalies [Dataset]. http://marine-analyst.org/dev.py?N=simple&O=723&titre_page=Global%20Sea%20Level%20Anomalies&titre_chap=&maxlat=65.0&maxlon=44.0&minlon=-16.0&minlat=30.0&visit=1181:1160:1169:1852:2097:1159
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    http://www.marine-analyst.eu
    Authors
    EMODnet Physics Sea Level
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Global Sea Level Anomalies - this product uses the PSMSL relative sea level trends. It shows annual variation compared to a long term trend (at least 30 years of recordings). Use 'viewparams' parameter to specify: (1) syear (start year) from 1900 to 2015 (2) eyear (end year) from 1900 to 2015 (3) ayear (anomaly year) from 1900 to 2015 --- Note1: syear < eyear, syear <= ayear <= eyear --- Note2: eyear - syear >=30 -- Example: viewparams=syear:1900;eyear:2015;ayear:1900

  12. c

    Seasonal forecast anomalies on pressure levels

    • cds.climate.copernicus.eu
    grib
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). Seasonal forecast anomalies on pressure levels [Dataset]. http://doi.org/10.24381/cds.7d481b7a
    Explore at:
    gribAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset authored and provided by
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/Additional-licence-to-use-non-European-contributions/Additional-licence-to-use-non-European-contributions_7f60a470cb29d48993fa5d9d788b33374a9ff7aae3dd4e7ba8429cc95c53f592.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/Additional-licence-to-use-non-European-contributions/Additional-licence-to-use-non-European-contributions_7f60a470cb29d48993fa5d9d788b33374a9ff7aae3dd4e7ba8429cc95c53f592.pdf

    Time period covered
    Jan 1, 2017 - Sep 1, 2025
    Description

    This entry covers pressure-level data post-processed for bias adjustment on a monthly time resolution. Seasonal forecasts provide a long-range outlook of changes in the Earth system over periods of a few weeks or months, as a result of predictable changes in some of the slow-varying components of the system. For example, ocean temperatures typically vary slowly, on timescales of weeks or months; as the ocean has an impact on the overlaying atmosphere, the variability of its properties (e.g. temperature) can modify both local and remote atmospheric conditions. Such modifications of the 'usual' atmospheric conditions are the essence of all long-range (e.g. seasonal) forecasts. This is different from a weather forecast, which gives a lot more precise detail - both in time and space - of the evolution of the state of the atmosphere over a few days into the future. Beyond a few days, the chaotic nature of the atmosphere limits the possibility to predict precise changes at local scales. This is one of the reasons long-range forecasts of atmospheric conditions have large uncertainties. To quantify such uncertainties, long-range forecasts use ensembles, and meaningful forecast products reflect a distributions of outcomes. Given the complex, non-linear interactions between the individual components of the Earth system, the best tools for long-range forecasting are climate models which include as many of the key components of the system and possible; typically, such models include representations of the atmosphere, ocean and land surface. These models are initialised with data describing the state of the system at the starting point of the forecast, and used to predict the evolution of this state in time. While uncertainties coming from imperfect knowledge of the initial conditions of the components of the Earth system can be described with the use of ensembles, uncertainty arising from approximations made in the models are very much dependent on the choice of model. A convenient way to quantify the effect of these approximations is to combine outputs from several models, independently developed, initialised and operated. To this effect, the C3S provides a multi-system seasonal forecast service, where data produced by state-of-the-art seasonal forecast systems developed, implemented and operated at forecast centres in several European countries is collected, processed and combined to enable user-relevant applications. The composition of the C3S seasonal multi-system and the full content of the database underpinning the service are described in the documentation. The data is grouped in several catalogue entries (CDS datasets), currently defined by the type of variable (single-level or multi-level, on pressure surfaces) and the level of post-processing applied (data at original time resolution, processing on temporal aggregation and post-processing related to bias adjustment). The variables available in this data set are listed in the table below. The data includes forecasts created in real-time each month starting from the publication of this entry.

  13. a

    SDG 06 - Water Anomaly (ISciences)

    • sdgstoday-sdsn.hub.arcgis.com
    Updated Jan 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sustainable Development Solutions Network (2021). SDG 06 - Water Anomaly (ISciences) [Dataset]. https://sdgstoday-sdsn.hub.arcgis.com/datasets/sdg-06-water-anomaly-isciences-1
    Explore at:
    Dataset updated
    Jan 21, 2021
    Dataset authored and provided by
    Sustainable Development Solutions Network
    Area covered
    Description

    This layer is part of SDGs Today. Please see sdgstoday.orgThe ISciences Water Security Indicator Model v2 (WSIMv2) describes places where water availability during the most recent 12-month period is more or less than would be expected based on a 1950-2009 baseline period. These anomalies are expressed in terms of return period – a measure that characterizes the rarity of an anomaly. For example, a return period of 10 years indicates an anomaly that would occur, on average, once every ten years. Higher return periods indicate more extreme and, therefore, more disruptive anomalies. The composite surplus indicator is calculated as the most extreme of the runoff and flow-accumulated runoff anomalies. The composite deficit indicator is calculated as the most extreme of the soil moisture, flow-accumulated runoff, and evapotranspiration deficit (potential minus actual evapotranspiration) anomalies. Anomalies are summarized into levels corresponding to ranges of return periods – near normal: less than 3 years, abnormal: 3-5 years, moderate: 5-10 years, severe: 10-20 years, extreme: 20-40 years, exceptional: greater than 40 years. Learn more about their methodological framework hereContact Daniel P. Baston or Thomas M. Parris for more information.

  14. Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An...

    • zenodo.org
    bin, csv, html
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy (2025). Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset" [Dataset]. http://doi.org/10.5281/zenodo.14062900
    Explore at:
    bin, html, csvAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy; Mohammad Saiful Islam; Mohamed Sami Rakha; William Pourmajidi; Janakan Sivaloganathan; John Steinbacher; Andriy Miranskyy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a large-scale anomaly detection dataset collected from IBM Cloud's Console over approximately 4.5 months. This high-dimensional dataset captures telemetry data from multiple data centers, specifically designed to aid researchers in developing and benchmarking anomaly detection methods in large-scale cloud environments. It contains 39,365 entries, each representing a 5-minute interval, with 117,448 features/attributes, as interval_start is used as the index. The dataset includes detailed information on request counts, HTTP response codes, and various aggregated statistics. The dataset also includes labeled anomaly events identified through IBM's internal monitoring tools, providing a comprehensive resource for real-world anomaly detection research and evaluation.

    File Descriptions

    • location_downtime.csv - Details planned and unplanned downtimes for IBM Cloud data centers, including start and end times in ISO 8601 format.
    • unpivoted_data.parquet - Contains raw telemetry data with 413 million+ rows, covering details like location, HTTP status codes, request types, and aggregated statistics (min, max, median response times).
    • anomaly_windows.csv - Ground truth for anomalies, listing start and end times of recorded anomalies, categorized by source (Issue Tracker, Instant Messenger, Test Log).
    • pivoted_data_all.parquet - Pivoted version of the telemetry dataset with 39,365 rows and 117,449 columns, including aggregated statistics across multiple metrics and intervals.
    • demo/demo.[ipynb|html]: This demo file provides examples of how to access data in the Parquet files, available in Jupyter Notebook (.ipynb) and HTML (.html) formats, respectively.

    Further details of the dataset can be found in Appendix B: Dataset Characteristics of the paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset." Sample code for training anomaly detectors using this data is provided in this package.

    When using the dataset, please cite it as follows:

    @misc{islam2024anomaly,
    title={Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset},
    author={Mohammad Saiful Islam and Mohamed Sami Rakha and William Pourmajidi and Janakan Sivaloganathan and John Steinbacher and Andriy Miranskyy},
    year={2024},
    eprint={2411.09047},
    archivePrefix={arXiv},
    url={https://arxiv.org/abs/2411.09047}
    }

  15. Z

    Data set for anomaly detection on a HPC system

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Borghesi (2023). Data set for anomaly detection on a HPC system [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3251872
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Andrea Borghesi
    Francesco Beneventi
    Andrea Bartolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains the data collected on the DAVIDE HPC system (CINECA & E4 & University of Bologna, Bologna, Italy) in the period March-May 2018.

    The data set has been used to train a autoencoder-based model to automatically detect anomalies in a semi-supervised fashion, on a real HPC system.

    This work is described in:

    1) "Anomaly Detection using Autoencoders in High Performance Computing Systems", Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini, IAAI19 (proceedings in process) -- https://arxiv.org/abs/1902.08447

    2) "Online Anomaly Detection in HPC Systems", Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini, AICAS19 (proceedings in process) -- https://arxiv.org/abs/1811.05269

    See the git repository for usage examples & details --> https://github.com/AndreaBorghesi/anomaly_detection_HPC

  16. D

    MVTec AD Dataset

    • datasetninja.com
    Updated Jun 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Bergmann; Kilian Batzner; Michael Fauser (2019). MVTec AD Dataset [Dataset]. https://datasetninja.com/mvtec-ad
    Explore at:
    Dataset updated
    Jun 20, 2019
    Dataset provided by
    Dataset Ninja
    Authors
    Paul Bergmann; Kilian Batzner; Michael Fauser
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The authors of the MVTec AD: the MVTec Anomaly Detection dataset addressed the critical task of detecting anomalous structures within natural image data, a crucial aspect of computer vision applications. To facilitate the development of methods for unsupervised anomaly detection, they introduced the MVTec AD dataset, comprising 5354 high-resolution color images encompassing various object and texture categories. The dataset comprises both normal images, intended for training, and images with anomalies, designed for testing. These anomalies manifest in over 70 distinct types of defects, including scratches, dents, contaminations, and structural alterations. The authors also provided pixel-precise ground truth annotations for all anomalies.

  17. Precipitation and Temperature Anomalies from MERRA-2 dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jun 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filipi N. Silva; Filipi N. Silva; Didier A. Vega-Oliveros; Didier A. Vega-Oliveros; Xiaoran Yan; Xiaoran Yan; Alessandro Flammini; Filippo Menczer; Filippo Menczer; Filippo Radicchi; Filippo Radicchi; Ben Kravitz; Ben Kravitz; Santo Fortunato; Santo Fortunato; Alessandro Flammini (2021). Precipitation and Temperature Anomalies from MERRA-2 dataset [Dataset]. http://doi.org/10.5281/zenodo.4270623
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 5, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Filipi N. Silva; Filipi N. Silva; Didier A. Vega-Oliveros; Didier A. Vega-Oliveros; Xiaoran Yan; Xiaoran Yan; Alessandro Flammini; Filippo Menczer; Filippo Menczer; Filippo Radicchi; Filippo Radicchi; Ben Kravitz; Ben Kravitz; Santo Fortunato; Santo Fortunato; Alessandro Flammini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anomalies in Precipitation and Temperature for tiles around the globe.

    Data derived from the reanalysis MERRA-2 project [1]. The dataset covers the period from 1980 to 2018, between 80ºN and 80ºS. The spatial resolution is 1.0º x 1.0º resulting in 45,792 tiles and a time resolution of 7 days (by averaging the values over each week). Precipitation time-series were re-scaled by applying a logarithmic function to all values.

    To discount seasonality effects, we averaged temperature and precipitation values for each of the 365 calendar days. We considered the interval from 1 January 1980 through 28 February 2018 as the climatic period for which the long-term averages were computed. The anomalies are then obtained by subtracting for each day the respective average temperature or precipitation from the climatic period (for example, the 1 January 1998 anomaly is computed as the value for that day minus the average of all January 1 values between 1980 and 2018).

    To process the data, use the code available in:

    https://github.com/filipinascimento/teleconnectionsgranger/

    http://arxiv.org/abs/2012.03848

    [1] A. Molod, L. Takacs, M. Suarez, and J. Bacmeister, “Development of the geos-5 atmospheric general circulation model: evolution from merra to merra2,” Geoscientific Model Development 8, 1339–1356 (2015).

  18. M

    Absolute Sea Level Anomalies - MultiPointTimeSeriesObservation - based on...

    • marine-analyst.eu
    • marine-analyst.org
    html
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMODnet Physics Sea Level (2025). Absolute Sea Level Anomalies - MultiPointTimeSeriesObservation - based on SONEL DB [Dataset]. http://www.marine-analyst.eu/dev.py?N=simple&O=699
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    http://www.marine-analyst.eu
    Authors
    EMODnet Physics Sea Level
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Global Sea Level Anomalies - this product uses the PSMSL relative sea level trends. It shows annual variation compared to a long term trend (at least 30 years of recordings). Use 'viewparams' parameter to specify: (1) syear (start year) from 1900 to 2015 (2) eyear (end year) from 1900 to 2015 (3) ayear (anomaly year) from 1900 to 2015 --- Note1: syear < eyear, syear <= ayear <= eyear --- Note2: eyear - syear >=30 -- Example: viewparams=syear:1900;eyear:2015;ayear:1900

  19. SDG 06 - Water Anomaly (Deficit) (ISciences)

    • sdgstoday-sdsn.hub.arcgis.com
    Updated Jan 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sustainable Development Solutions Network (2021). SDG 06 - Water Anomaly (Deficit) (ISciences) [Dataset]. https://sdgstoday-sdsn.hub.arcgis.com/maps/2ef325503d124571b5c322df5ddfe323
    Explore at:
    Dataset updated
    Jan 21, 2021
    Dataset authored and provided by
    Sustainable Development Solutions Networkhttps://www.unsdsn.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This map is part of SDGs Today. Please see sdgstoday.orgThe ISciences Water Security Indicator Model v2 (WSIMv2) describes places where water availability during the most recent 12-month period is more or less than would be expected based on a 1950-2009 baseline period. These anomalies are expressed in terms of return period – a measure that characterizes the rarity of an anomaly. For example, a return period of 10 years indicates an anomaly that would occur, on average, once every ten years. Higher return periods indicate more extreme and, therefore, more disruptive anomalies. The composite surplus indicator is calculated as the most extreme of the runoff and flow-accumulated runoff anomalies. The composite deficit indicator is calculated as the most extreme of the soil moisture, flow-accumulated runoff, and evapotranspiration deficit (potential minus actual evapotranspiration) anomalies. Anomalies are summarized into levels corresponding to ranges of return periods – near normal: less than 3 years, abnormal: 3-5 years, moderate: 5-10 years, severe: 10-20 years, extreme: 20-40 years, exceptional: greater than 40 years. Learn more about their methodological framework here. Contact Daniel P. Baston (dbaston@isciences.com), Thomas M. Parris (parris@isciences.com) for more information.

  20. DCASE 2022 Challenge Task 2 Development Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kota Dohi; Kota Dohi; Keisuke Imoto; Keisuke Imoto; Yuma Koizumi; Yuma Koizumi; Noboru Harada; Noboru Harada; Daisuke Niizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Masaaki Yamamoto; Yohei Kawaguchi; Yohei Kawaguchi; Daisuke Niizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Masaaki Yamamoto (2022). DCASE 2022 Challenge Task 2 Development Dataset [Dataset]. http://doi.org/10.5281/zenodo.6355122
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kota Dohi; Kota Dohi; Keisuke Imoto; Keisuke Imoto; Yuma Koizumi; Yuma Koizumi; Noboru Harada; Noboru Harada; Daisuke Niizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Masaaki Yamamoto; Yohei Kawaguchi; Yohei Kawaguchi; Daisuke Niizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Masaaki Yamamoto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This dataset is the "development dataset" for the DCASE 2022 Challenge Task 2 "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques".

    The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-second audio that includes both a machine's operating sound and environmental noise. The following seven types of real/toy machines are used in this task:

    • Fan
    • Gearbox
    • Bearing
    • Slide rail
    • ToyCar
    • ToyTrain
    • Valve

    Overview of the task

    Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial intelligence (AI)-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

    This task is the follow-up to DCASE 2020 Task 2 and DCASE 2021 Task 2. The task this year is to detect anomalous sounds under three main conditions:

    1. Only normal sound clips are provided as training data (i.e., unsupervised learning scenario). In real-world factories, anomalies rarely occur and are highly diverse. Therefore, exhaustive patterns of anomalous sounds are impossible to create or collect and unknown anomalous sounds that were not observed in the given training data must be detected. This condition is the same as in DCASE 2020 Task 2 and DCASE 2021 Task 2.

    2. Factors other than anomalies change the acoustic characteristics between training and test data (i.e., domain shift). In real-world cases, operational conditions of machines or environmental noise often differ between the training and testing phases. For example, the operation speed of a conveyor can change due to seasonal demand, or environmental noise can fluctuate depending on the states of surrounding machines. This condition is the same as in DCASE 2021 Task 2.

    3. In test data, samples unaffected by domain shifts (source domain data) and those affected by domain shifts (target domain data) are mixed, and the source/target domain of each sample is not specified. Therefore, the model must detect anomalies regardless of the domain (i.e., domain generalization).

    Definition

    We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

    • "Machine type" indicates the kind of machine, which in this task is one of seven: fan, gearbox, bearing, slide rail, valve, ToyCar, and ToyTrain.
    • A section is defined as a subset of the dataset for calculating performance metrics. Each section is dedicated to a specific type of domain shift.
    • The source domain is the domain under which most of the training data and part of the test data were recorded, and the target domain is a different set of domains under which a few of the training data and part of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, SNR, etc.
    • Attributes are parameters that define states of machines or types of noise.

    Dataset

    This dataset consists of three sections for each machine type (Sections 00, 01, and 02), and each section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

    File names and attribute csv files

    File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

    [filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

    Recording procedure

    Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

    Directory structure

    - /dev_data
    - /fan
    - /train (only normal clips)
    - /section_00_source_train_normal_0000_

    Baseline system

    Two baseline systems are available on the Github repository baseline_ae and baseline_mobile_net_v2. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Condition of use

    This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Citation

    If you use this dataset, please cite all the following three papers.

    • Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi, Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques. In arXiv e-prints: 2206.05876, 2022. [URL]
    • Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In arXiv e-prints: 2205.13879, 2022. [URL]
    • Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Comparison of Unsupervised Anomaly Detection Methods [Dataset]. https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods

Comparison of Unsupervised Anomaly Detection Methods

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

Search
Clear search
Close search
Google apps
Main menu