The dataset contains workplace noise measurement results collected during health hazards evaluation surveys from 1997 to 2013 for over 800 personal noise exposure assessments. The collected data about exposure are based on OSHA and NIOSH assessment criteria and are accompanied by description of location, industry, work type, working area, the activity that generates exposure, use of specific personal protective equipment as well as other variables.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This is a collection of data sets acquired for measurements of noise figure and receive system noise of wireless/radio frequency receivers and transceivers. These data include tabular data that list 1) Inputs: calibrated input signal and excess noise levels, and 2) Outputs: summary statistics for each type of user data collected for each DUT. The experiments that produced these data were meant to be used to assess noise measurands, but the data are generic and could be applied to other problems if desired. The structure of each zip archive dataset is as follows: | Root |-- (Anonymized DUT name 1) |---- Data file 1 |---- Data file 2 |---- ...Data file N |---- DUT-README.txt |-- (Anonymized DUT name 2) |---- Data file 1 |---- Data file 2 |---- ...Data file N |---- DUT-README.txt | (etc.) Data tables in each archive are provided as comma-separated values (.csv), and the descriptive text files are ASCII (.txt). Detailed discussion of the test conditions and data formatting is given by the DUT-README.txt for each DUT.
Far-field in-Home Noise Data by Mic-Array, consists of multiple sets of products, each with a different type of microphone arrays. These data were collected from real home scenes of the indoor environment of families, can be used for in-home tasks, such as voice enhancement and automatic speech recognition.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The audio content is taken from Freesound, and the dataset was curated using the Freesound Annotator. The noisy set of FSDnoisy18k consists of 15,813 audio clips (38.8h), and the test set consists of 947 audio clips (1.4h) with correct labels. The dataset features two main types of label noise: in-vocabulary (IV) and out-of-vocabulary (OOV). IV applies when, given an observed label that is incorrect or incomplete, the true or missing label is part of the target class set. Analogously, OOV means that the true or missing label is not covered by those 20 classes.
The European Directive 2002/49/EC of 25 June 2002 on the assessment and management of environmental noise aims at a harmonised assessment of exposure to noise in the Member States. It defines them as representations of data describing a noise situation according to a noise indicator, indicating exceedances of limit values, the number of persons exposed. (Article 3 of the Decree of 24 March 2006 and Article 7 of the Decree of 4 April 2006). Noise maps are not prescriptive. These are information documents that are not legally enforceable. As graphic elements, however, they can supplement a Local Planning Plan (LDP). As part of an Urban Travel Plan (UDP), maps can be used to establish baselines and target areas where better traffic management is needed. To quantify the level of noise emitted by an infrastructure over an average day, two indices are used, the Lden index and the Ln index, recommended for all modes of transport at European level: — Lden: representative indicator of the average level over all 24 hours of the day, — LN: representative indicator of the average sound level for the period 22h-6h. (average noise equivalent night)
Noise levels are assessed using digital models (computer software) incorporating the main parameters that influence noise and its propagation (traffic data, field topology, meteorological data, etc.). The noise maps thus produced are then cross-checked with the demographic data of the areas concerned in order to make an estimate of the population exposed to noise pollution. The noise level shown on the noise maps is derived from a calculation method which gives approximate values and often higher than reality (maximists) in a noise area considered critical. An in situ noise control can determine precisely the noise to which a construction and its occupants may be exposed.
Location maps of noise threshold exceedance zones (type C map), day.
It is a map representing (for the year in which the maps are drawn up), the areas where the Lden values are exceeded. (68 db) (day). The roads concerned have been selected in accordance with the Prefectural Decree approving strategic noise maps of the Land Transport Infrastructures whose annual traffic is more than 3 million vehicles in the Pyrenees-Atlantiques department. (Prefectural decree of 12 October 2018 n°64-2018-10-12-001): — national motorways granted A63 and A64 — national N134 — departmental D2 D6 D9 D33 D37 D281 D309 D501 D635 D802 D810 D811 D817 D834 D911 D912 D918 D932 D936 D938 D943 D947 — several communal routes of the municipalities of Anglet, Bayonne, Biarritz, Billère, Bizanos, Gelos, Hendaye, Idron, Jurançon, Lescar, Lons, Oloron-Sainte-Marie, Pau, Saint-Jean-de-Luz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Immediate and delayed recall of words under white noise, speech shaped noise, and multi- talker speech babble. Digit span, RAVLT and TMT data is also given.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
For downloading the latest dataset and more details, including collection methods and technical information, please visit:
This dataset contains 535 recordings of heart and lung sounds captured using a digital stethoscope from a clinical manikin, including both individual and mixed recordings of heart and lung sounds; 50 heart sounds, 50 lung sounds, and 145 mixed sounds. For each mixed sound, the corresponding source heart sound (145 recordings) and source lung sound (145 recordings) were also recorded. It includes recordings from different anatomical chest locations, with normal and abnormal sounds. Each recording has been filtered to highlight specific sound types, making it valuable for artificial intelligence (AI) research and applications in areas such as automated cardiopulmonary disease detection, sound classification, and deep learning algorithms related to audio signal processing. - Recording Sources: Heart and lung sounds captured from a clinical manikin simulating real physiological conditions. - Data Type: Audio files (.wav format) - Sound Types: Normal Heart, Late Diastolic Murmur, Mid Systolic Murmur, Late Systolic Murmur, Atrial Fibrillation, Fourth Heart Sound, Early Systolic Murmur, Third Heart Sound, Tachycardia, Atrioventricular Block, Normal Lung, Wheezing, Fine Crackles, Rhonchi, Pleural Rub, and Coarse Crackles. - Auscultation Landmarks: Right Upper Sternal Border, Left Upper Sternal Border, Lower Left Sternal Border, Right Costal Margin, Left Costal Margin, Apex, Right Upper Anterior, Left Upper Anterior, Right Mid Anterior, Left Mid Anterior, Right Lower Anterior, and Left Lower Anterior. - Applications: AI-based cardiopulmonary disease detection, unsupervised sound separation techniques, deep learning for audio signal processing. - Large Zip File Notice: Please extract Mix1.zip, Mix2.zip, and Mix3.zip, then merge the contents to create Mix.zip.
This dataset contains 210 recordings of heart and lung sounds captured using a digital stethoscope from a clinical manikin, including both individual and mixed recordings of heart and lung sounds; 50 heart sounds, 50 lung sounds, and 110 mixed sounds. It includes recordings from different anatomical chest locations, with normal and abnormal sounds. Each recording has been filtered to highlight specific sound types, making it valuable for artificial intelligence (AI) research and applications in areas such as automated cardiopulmonary disease detection, sound classification, and deep learning algorithms related to audio signal processing.
If you use this code or dataset in your research, please cite:
© 2024 by Yasaman Torabi. All rights reserved.
FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019.
Citation
If you use the FSDKaggle2019 dataset or part of it, please cite our DCASE 2019 paper:
Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra. "Audio tagging with noisy labels and minimal supervision". Proceedings of the DCASE 2019 Workshop, NYC, US (2019)
You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2019.
Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017
Data curators
Eduardo Fonseca, Manoj Plakal, Xavier Favory, Jordi Pons
Contact
You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.
ABOUT FSDKaggle2019
Freesound Dataset Kaggle 2019 (or FSDKaggle2019 for short) is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology [1]. FSDKaggle2019 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. Please visit the DCASE2019 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound Audio Tagging 2019. It was organized by researchers from the Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF), and from Sound Understanding team at Google AI Perception. The competition intended to provide insight towards the development of broadly-applicable sound event classifiers able to cope with label noise and minimal supervision conditions.
FSDKaggle2019 employs audio clips from the following sources:
Freesound Dataset (FSD): a dataset being collected at the MTG-UPF based on Freesound content organized with the AudioSet Ontology
The soundtracks of a pool of Flickr videos taken from the Yahoo Flickr Creative Commons 100M dataset (YFCC)
The audio data is labeled using a vocabulary of 80 labels from Google’s AudioSet Ontology [1], covering diverse topics: Guitar and other Musical Instruments, Percussion, Water, Digestive, Respiratory sounds, Human voice, Human locomotion, Hands, Human group actions, Insect, Domestic animals, Glass, Liquid, Motor vehicle (road), Mechanisms, Doors, and a variety of Domestic sounds. The full list of categories can be inspected in vocabulary.csv (see Files & Download below). The goal of the task was to build a multi-label audio tagging system that can predict appropriate label(s) for each audio clip in a test set.
What follows is a summary of some of the most relevant characteristics of FSDKaggle2019. Nevertheless, it is highly recommended to read our DCASE 2019 paper for a more in-depth description of the dataset and how it was built.
Ground Truth Labels
The ground truth labels are provided at the clip-level, and express the presence of a sound category in the audio clip, hence can be considered weak labels or tags. Audio clips have variable lengths (roughly from 0.3 to 30s).
The audio content from FSD has been manually labeled by humans following a data labeling process using the Freesound Annotator platform. Most labels have inter-annotator agreement but not all of them. More details about the data labeling process and the Freesound Annotator can be found in [2].
The YFCC soundtracks were labeled using automated heuristics applied to the audio content and metadata of the original Flickr clips. Hence, a substantial amount of label noise can be expected. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises. More information about some of the types of label noise that can be encountered is available in [3].
Specifically, FSDKaggle2019 features three types of label quality, one for each set in the dataset:
curated train set: correct (but potentially incomplete) labels
noisy train set: noisy labels
test set: correct and complete labels
Further details can be found below in the sections for each set.
Format
All audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.
DATA SPLIT
FSDKaggle2019 consists of two train sets and one test set. The idea is to limit the supervision provided for training (i.e., the manually-labeled, hence reliable, data), thus promoting approaches to deal with label noise.
Curated train set
The curated train set consists of manually-labeled data from FSD.
Number of clips/class: 75 except in a few cases (where there are less)
Total number of clips: 4970
Avg number of labels/clip: 1.2
Total duration: 10.5 hours
The duration of the audio clips ranges from 0.3 to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording/uploading sounds. Labels are correct but potentially incomplete. It can happen that a few of these audio clips present additional acoustic material beyond the provided ground truth label(s).
Noisy train set
The noisy train set is a larger set of noisy web audio data from Flickr videos taken from the YFCC dataset [5].
Number of clips/class: 300
Total number of clips: 19,815
Avg number of labels/clip: 1.2
Total duration: ~80 hours
The duration of the audio clips ranges from 1s to 15s, with the vast majority lasting 15s. Labels are automatically generated and purposefully noisy. No human validation is involved. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises.
Considering the numbers above, the per-class data distribution available for training is, for most of the classes, 300 clips from the noisy train set and 75 clips from the curated train set. This means 80% noisy / 20% curated at the clip level, while at the duration level the proportion is more extreme considering the variable-length clips.
Test set
The test set is used for system evaluation and consists of manually-labeled data from FSD.
Number of clips/class: between 50 and 150
Total number of clips: 4481
Avg number of labels/clip: 1.4
Total duration: 12.9 hours
The acoustic material present in the test set clips is labeled exhaustively using the aforementioned vocabulary of 80 classes. Most labels have inter-annotator agreement but not all of them. Except human error, the label(s) are correct and complete considering the target vocabulary; nonetheless, a few clips could still present additional (unlabeled) acoustic content out of the vocabulary.
During the DCASE2019 Challenge Task 2, the test set was split into two subsets, for the public and private leaderboards, and only the data corresponding to the public leaderboard was provided. In this current package you will find the full test set with all the test labels. To allow comparison with previous work, the file test_post_competition.csv includes a flag to determine the corresponding leaderboard (public or private) for each test clip (see more info in Files & Download below).
Acoustic mismatch
As mentioned before, FSDKaggle2019 uses audio clips from two sources:
FSD: curated train set and test set, and
YFCC: noisy train set.
While the sources of audio (Freesound and Flickr) are collaboratively contributed and pretty diverse themselves, a certain acoustic mismatch can be expected between FSD and YFCC. We conjecture this mismatch comes from a variety of reasons. For example, through acoustic inspection of a small sample of both data sources, we find a higher percentage of high quality recordings in FSD. In addition, audio clips in Freesound are typically recorded with the purpose of capturing audio, which is not necessarily the case in YFCC.
This mismatch can have an impact in the evaluation, considering that most of the train data come from YFCC, while all test data are drawn from FSD. This constraint (i.e., noisy training data coming from a different web audio source than the test set) is sometimes a real-world condition.
LICENSE
All clips in FSDKaggle2019 are released under Creative Commons (CC) licenses. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses.
Curated train set and test set. All clips in Freesound are released under different modalities of Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. The licenses are specified in the files train_curated_post_competition.csv and test_post_competition.csv. These licenses can be CC0, CC-BY, CC-BY-NC and CC Sampling+.
Noisy train set. Similarly, the licenses of the soundtracks from Flickr used in FSDKaggle2019 are specified in the file train_noisy_post_competition.csv. These licenses can be CC-BY and CC BY-SA.
In addition, FSDKaggle2019 as a whole is the result of a curation process and it has an additional license. FSDKaggle2019 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2019.doc zip file.
FILES & DOWNLOAD
FSDKaggle2019 can be downloaded as a series of zip files with the following directory structure:
root
│
└───FSDKaggle2019.audio_train_curated/ Audio clips in the curated train set
│
└───FSDKaggle2019.audio_train_noisy/ Audio clips in the noisy
Silencio’s POI Noise-Level Dataset provides noise profiles segmented by business type, such as restaurants, gyms, nightlife, offices, and more, based on over 10 million POI check-ins worldwide. This dataset enables competitor benchmarking and market analysis by revealing how different types of businesses operate within diverse acoustic environments.
Use this dataset to: • Benchmark competitors based on environmental soundscapes • Analyze customer experience factors linked to noise levels (i.e. how easy it is to have a conversation within the POI or how full the venue is). • Gain deeper insights into urban commercial environments
Delivered via CSV or S3 bucket. AI-driven insights will soon expand this dataset’s capabilities. Fully anonymized and GDPR-compliant.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Noise Round 1 NRA Data. Published by Environmental Protection Agency. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).This is the results of the noise mapping (round 1) of the major roads carried for the EPA under EU Directive 2002/49/EC. The directive is implemented in Ireland by the Environmental Noise Regulations 2006 (SI 140/2006)....
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental conditions (MIMII DUE). The dataset consists of normal and abnormal operating sounds of five different types of industrial machines, i.e., fans, gearboxes, pumps, slide rails, and valves. The data for each machine type includes six subsets called ``sections'', and each section roughly corresponds to a single product. Each section consists of data from two domains, called the source domain and the target domain, with different conditions such as operating speed and environmental noise. This dataset is a subset of the dataset for DCASE 2021 Challenge Task 2, so the dataset is entirely the same as data included in the development dataset and additional training dataset. For more information, please see this paper and the pages of the development dataset and the task description for DCASE 2021 Challenge Task 2.
Baseline system
Two simple baseline systems are available on the Github repositories [URL] and [URL]. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.
Conditions of use
This dataset was made by Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Publication
If you use this dataset, please cite the following paper:
Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Nakamura, and Yohei Kawaguchi, "MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions," arXiv preprint arXiv: 2105.02702, 2021. [URL]
Feedback
If there is any problem, please contact us:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Noise attenuation is a key step in seismic data processing to enhance desired signal features, minimize artifacts, and avoid misinterpretation. However, traditional attenuation methods are often time-consuming and require expert parameter selection. Deep learning can successfully suppress various types of noise via a trained neural network, potentially saving time and effort while avoiding mistakes. In this study, we tested a U-net method to assess its usefulness in attenuating repetitive coherent events (e.g., pumpjack noise) and to investigate the influence of gain methods on denoising quality. We used the U-net method because it preserves fine-scale information during training. Its performance is controlled by network parameters and improved by minimizing misfits between true data and network estimates. A gain method is necessary to avoid the network’s parameter optimization being biased toward large values in data. We first generated synthetic seismic data with added noise for training. Next, we recovered amplitudes using an automatic gain control (AGC) or a 2D AGC (using adjacent traces’ amplitudes). Then, a back-propagation algorithm minimized the Euclidean norm cost function to optimize the network parameters for better performance. The updating step size and direction were determined using an adaptive momentum optimization method. Finally, we removed the gain effect and evaluated the denoising quality using a normalized root-mean-square error (RMSE). Based on RMSE, the data pre-processed by the 2D AGC performed better with RMSE decreasing from 0.225 to 0.09. We also assessed the limitations of the network when source wavelets or noise differed from the training set. The denoising quality of the trained network was sensitive to the change in the wavelet and noise type. The noisy data in the limitation test set were not substantially improved. The trained network was also tested on the seismic field data collected at Hussar, Alberta, by the CREWES Project. The data had not only excellent reflection events but also substantial pumpjack noise on some shot gathers. We were able to significantly reduce the noise (favorably in comparison to traditional techniques) to considerably allow greater reflection continuity. U-net noise reduction techniques show considerable promise in seismic data processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AeroSonicDB (YPAD-0523): Labelled audio dataset for acoustic detection and classification of aircraft
Version 1.1 (September 2023)
Publication
When using this data in an academic work, please reference the DOI and version.
Description
AeroSonicDB:YPAD-0523 is a specialised dataset of ADS-B labelled audio clips for research in the fields of environmental noise attribution and machine listening, particularly acoustic detection and classification of low-flying aircraft. Audio files in this dataset were recorded at locations in close proximity to a flight path approaching or departing Adelaide International Airport’s (ICAO code: YPAD) primary runway, 05/23. Recordings are initially labelled from radio (ADS-B) messages received from the aircraft overhead, then human verified and annotated with the first and final moments which the target aircraft is audible.
A total of 1,895 audio clips are distributed across two top-level classes, “Aircraft” (8.87 hours) and “Silence” (3.52 hours). The aircraft class is then further broken-down into four subclasses, which broadly describe the structure of the aircraft and propulsion mechanism. A variety of additional “airframe” features are provided to give researchers finer control of the dataset, and the opportunity to develop ontologies specific to their own use case.
For convenience, the dataset has been split into training (10.04 hours) and testing (2.35 hours) subsets, with the training set further split into 5 distinct folds for cross-validation. These splits are performed to prevent data-leakage between folds and the test set, ensuring samples collected in the same recording session (distinct in time, location and microphone) are assigned to the same fold.
Researchers may find applications for this dataset in a number of fields; particularly aircraft noise isolation and noise monitoring in an urban environment, development of passive acoustic systems to assist radar technology, and understanding the sources of aircraft noise to help manufacturers design less-noisy aircraft.
Audio data
ADS-B (Automatic Dependent Surveillance–Broadcast) messages transmitted directly from aircraft are used to automatically trigger, capture and label audio samples. A 60-second recording is triggered when an aircraft transmits a message indicating it is within a specified distance of the recording device (see “Location data” below for specifics). The resulting audio file is labelled with the unique ICAO identifier code for the aircraft, as well as its last reported altitude, date, time, location and microphone. The recording is then human verified and annotated with timestamps for the first and last moments the aircraft is audible. In total, AeroSonicDB contains 625 recordings of low-altitude aircraft - varying in length from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio.
A collection of urban background noise without aircraft (silence) is included with the dataset as a means of distinguishing location specific environmental noises from aircraft noises. 10-second background noise, or “silence” recordings are triggered only when there are no aircraft broadcasting they are within a specified distance of the recording device (see “Location data” below). These “silence” recordings are also human verified to ensure no aircraft noise is present. The dataset contains 1,270 clips of silence/urban background noise.
Location data
Recordings have been collected from three (3) locations. GPS coordinates for each location are provided in the "locations.json" file. In order to protect privacy, coordinates have been provided for a road or public space nearby the recording device instead of its exact location.
Location: 0
Situated in a suburban environment approximately 15.5km north-east of the start/end of the runway. For Adelaide, typical south-westerly winds bring most arriving aircraft past this location on approach. Winds from the north or east will cause aircraft to take-off to the north-east, however not all departing aircraft will maintain a course to trigger a recording at this location. The "trigger distance" for this location is set for 3km to ensure small/slower aircraft and large/faster aircraft are captured within a sixty-second recording.
"Silence" or ambient background noises at this location include; cars, motorbikes, light-trucks, garbage trucks, power-tools, lawn mowers, construction sounds, sirens, people talking, dogs barking and a wide range of Australian native birds (New Holland Honeyeaters, Wattlebirds, Australian Magpies, Australian Ravens, Spotted Doves, Rainbow Lorikeets and others).
Location: 1
Situated approximately 500m south-east of the south-eastern end of the runway, this location is nearby recreational areas (golf course, skate park and parklands) with a busy road/highway inbetween the location and runway. This location features heavy winds and road traffic, as well as people talking, walking and riding, and also birds such as the Australian Magpie and Noisy Miner. The trigger distance for this location is set to 1km. Due to their low altitude aircraft are louder, but audible for a shorter time compared to "Location 0".
Location: 2
As an alternative to "Location 1", this location is situated approximately 950m south-east of the end of the runway. This location has a wastewater facility to the north, a residential area to the south and a popular beach to the west. This location offers greater wind protection and further distance from airport and highway noises. Ambient background sounds feature close proximity cars and motorbikes, cyclists, people walking, nail guns and other construction sounds, as well as the local birds mentioned above.
Aircraft metadata
Supplementary "airframe" metadata for all aircraft has been gathered to help broaden the research possibilities from this dataset. Airframe information was collected and cross-checked from a number of open-source databases. The author has no reason to beleive any significant errors exist in the "aircraft_meta" files, however future versions of this dataset plan to obtain aircraft information directly from ICAO (International Civil Aviation Organization) to ensure a single, verifiable source of information.
Class/subclass ontology (minutes of recordings)
0. no aircraft (202)
0: no aircraft (202)
1. aircraft (214)
1: piston-propeller aeroplane (12)
2: turbine-propeller aeroplane (37)
3: turbine-fan aeroplane (163)
4: rotorcraft (1.6)
The subclasses are a combination of the "airframe" and "engtype" features. Piston and Turboshaft rotorcraft/helicopters have been combined into a single subclass due to the small number of samples.
Data splits
Audio recordings have been split into training (81%) and test (19%) sets. The training set has further been split into 5 folds, giving researchers a common split to perform 5-fold cross-validation to ensure reproducibility and comparable results. Data leakage into the test set has been avoided by ensuring recordings are disjointed from the training set by time and location - meaning samples in the test set for a particular location were recorded after any samples included in the training set for that particular location.
Labelled data
The entire dataset (training and test) is referenced and labelled in the “sample_meta.csv” file. Each row contains a reference to a unique recording, its meta information, annotations and airframe features.
Alternatively, these labels can be derived directly from the filename of the sample (see below). The “aircraft_meta.csv” and “aircraft_meta.json” files can be used to reference aircraft specific features - such as; manufacturer, engine type, ICAO type designator etc. (see “Columns/Labels” below for all features).
File naming convention
Audio samples are in WAV format, with some metadata stored in the filename.
Basic Convention
“Aircraft ID + Date + Time + Location ID + Microphone ID”
“XXXXXX_YYYY-MM-DD_hh-mm-ss_X_X”
Sample with aircraft
{hex_id} _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
7C7CD0_2023-05-09_12-42-55_2_1.wav
Sample without aircraft
“Silence” files are denoted with six (6) leading zeros rather than an aircraft hex code. All relevant metadata for “silence” samples are contained in the audio filename, and again in the accompanying “sample_meta.csv”
000000 _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
000000_2023-05-09_12-30-55_2_1.wav
Columns/Labels
(found in sample_meta.csv, aircraft_meta.csv/json files)
train-test: Train-test split (train, test)
fold: Digit from 1 to 5 splitting the training data 5 ways (else test)
filename: The filename of the audio recording
date: Date of the recording
time: Time of the recording
location: ID for the location of the recording
mic: ID of the microphone used
class: Top-level label for the recording (eg. 0 = No aircraft, 1 =
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Noise Round 2 Road (Day). Published by Environmental Protection Agency. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).This is a polygon dataset of the strategic noise mapping of roads, which were identified as those roads exceeding the flow threshold of 3 million passages per year, in the form of noise contours for the Lden (day) and Lnight (night) periods for Dublin and Cork agglomerations and the major roads outside of the agglomerations. The Db Value represents the average decibel value during the day time....
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is the "development dataset" for the DCASE 2022 Challenge Task 2 "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques".
The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-second audio that includes both a machine's operating sound and environmental noise. The following seven types of real/toy machines are used in this task:
Overview of the task
Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial intelligence (AI)-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.
This task is the follow-up to DCASE 2020 Task 2 and DCASE 2021 Task 2. The task this year is to detect anomalous sounds under three main conditions:
1. Only normal sound clips are provided as training data (i.e., unsupervised learning scenario). In real-world factories, anomalies rarely occur and are highly diverse. Therefore, exhaustive patterns of anomalous sounds are impossible to create or collect and unknown anomalous sounds that were not observed in the given training data must be detected. This condition is the same as in DCASE 2020 Task 2 and DCASE 2021 Task 2.
2. Factors other than anomalies change the acoustic characteristics between training and test data (i.e., domain shift). In real-world cases, operational conditions of machines or environmental noise often differ between the training and testing phases. For example, the operation speed of a conveyor can change due to seasonal demand, or environmental noise can fluctuate depending on the states of surrounding machines. This condition is the same as in DCASE 2021 Task 2.
3. In test data, samples unaffected by domain shifts (source domain data) and those affected by domain shifts (target domain data) are mixed, and the source/target domain of each sample is not specified. Therefore, the model must detect anomalies regardless of the domain (i.e., domain generalization).
Definition
We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".
Dataset
This dataset consists of three sections for each machine type (Sections 00, 01, and 02), and each section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.
File names and attribute csv files
File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:
[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
Recording procedure
Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.
Directory structure
- /dev_data
- /fan
- /train (only normal clips)
- /section_00_source_train_normal_0000_
Baseline system
Two baseline systems are available on the Github repository baseline_ae and baseline_mobile_net_v2. The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.
Condition of use
This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Citation
If you use this dataset, please cite all the following three papers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current dataset consists of three main folders:
01-Stimuli/: Contains the three sets of noises (white noise, bump noise, MPS noise) for the 12 study participants (S01 to S12).
02-Raw-data/fastACI/: Contains the raw data as obtained for each participant, which are also available within the GitHub repository of the fastACI toolbox, using the same directory tree. The results for each (anonymised) participant (under: publ_osses2022b/data_SXX/1-experimental_results/) include their audiometric thresholds (folder: audiometry), the results for the Intellitest speech test (folder: intellitest), and for the phoneme-in-noise test /aba/-/ada/ for the three noises (savegame files in MAT format).
02-Raw-data/ACI_sim/: Contains the raw data as obtained for the artificial listener, i.e., the model osses2022a.m (available within the fastACI toolbox). Twelve sets of simulations (using the waveforms of participants S01 to S12) were run for the three types of test noises. The results of the simulations of the phoneme-in-noise test are stored in the savegame MAT files. The template derived from 100 repetitions of /aba/ and /aba/ at an SNR=-6 dB in white noise is also included (template-osses2022a-speechACI_Logatome-abda-S43M-trial-1-v1-white-2022-7-15-N-0100.mat). The same template was used in all simulations.
03-Post-proc-data/ACI_exp/: Auditory classification images (ACIs) derived from the participants' data (folder: ACI_exp) and from the simulations (folder: ACI_sim). For each participant (or artificial listener) there are three ACIs (MAT files) for each of the corresponding noises. Cross predictions are also included with performance predictions across 'participants' (Crosspred.mat, 12 cross predictions for each noise) or across 'noises' (Crosspred-noise.mat, 3 cross predictions for each participant). The cross predictions all have the same names but are stored in dedicated directories.
Use these data:
Download all these data, place them in a local directory of your computer. If you have MATLAB and you downloaded a local copy of the fastACI toolbox (open access at: GitHub) you can recreate the figures of our paper.
After initialising the toolbox (type 'startup_fastACI;', without quotation marks in MATLAB) and then type either of the following commands, to recreate the figure you want. To recreate the figures in the main text:
publ_osses2022b_JASA_figs('fig1','zenodo');
publ_osses2022b_JASA_figs('fig2a','zenodo');
publ_osses2022b_JASA_figs('fig2b','zenodo');
publ_osses2022b_JASA_figs('fig3','zenodo');
publ_osses2022b_JASA_figs('fig4','zenodo');
publ_osses2022b_JASA_figs('fig5','zenodo');
publ_osses2022b_JASA_figs('fig6','zenodo');
publ_osses2022b_JASA_figs('fig7','zenodo');
publ_osses2022b_JASA_figs('fig8','zenodo');
publ_osses2022b_JASA_figs('fig8b','zenodo');
publ_osses2022b_JASA_figs('fig9','zenodo');
publ_osses2022b_JASA_figs('fig9b','zenodo');
publ_osses2022b_JASA_figs('fig10','zenodo');
To generate the figures of the supplementary materials (Appendix in the BioRxiv preprint):
publ_osses2022b_JASA_figs('fig1_suppl','zenodo');
publ_osses2022b_JASA_figs('fig2_suppl','zenodo');
publ_osses2022b_JASA_figs('fig3_suppl','zenodo');
publ_osses2022b_JASA_figs('fig3b_suppl','zenodo');
publ_osses2022b_JASA_figs('fig4_suppl','zenodo');
publ_osses2022b_JASA_figs('fig4b_suppl','zenodo');
publ_osses2022b_JASA_figs('fig5_suppl','zenodo');
publ_osses2022b_JASA_figs('fig5b_suppl','zenodo');
References:
Preprint: Alejandro Osses, Léo Varnet. "A microscopic investigation of the effect of random envelope fluctuations on phoneme-in-noise perception." BioRxiv.
fastACI toolbox: Alejandro Osses, Léo Varnet. fastACI toolbox: the MATLAB toolbox for investigating auditory perception using reverse correlation (v1.2). Zenodo. doi:10.5281/zenodo.7314014. Supplement to: https://github.com/aosses-tue/fastACI/tree/v1.2
Noise-level profiles by business type (restaurants, gyms, nightlife, etc.) from 10M+ POI check-ins across 200+ countries. Perfect for competitor benchmarking, brand environment analysis, and market research. CSV or S3 delivery.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
References - Jeong, Gangwon, Umberto Villa, and Mark A. Anastasio. "Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints." Photoacoustics (2025): 100700. - Park, Seonyeong, et al. "Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer." Journal of biomedical optics 28.6 (2023): 066002-066002. Overview - This dataset includes 80 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for photoacoustic computed tomography (PACT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in PACT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as a .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "p0": Initial pressure distribution [Pa] > "sos": Speed-of-sound map [mm/μs] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "pressure_lossless": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, under the assumption of a lossless medium (corresponding to Studies I, II, and III). > "pressure_lossy": Simulated noiseless pressure data obtained by numerically solving the first-order acoustic wave equation using the k-space pseudospectral method, incorporating a power-law acoustic absorption model to account for medium losses (corresponding to Study IV). * The pressure data were simulated using a ring-array transducer that consists of 512 receiving elements uniformly distributed along a ring with a radius of 72 mm. * Note: These pressure data are noiseless simulations. In Studies II–IV of the referenced paper, additive Gaussian i.i.d. noise were added to the measurement data. Users may add similar noise to the provided data as needed for their own studies. - In Study I, all spatial maps (e.g., sos) have dimensions of 512 × 512 pixels, with a pixel size of 0.32 mm × 0.32 mm. - In Study II and Study III, all spatial maps (sos) have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm. - In Study IV, both the sos and att maps have dimensions of 1024 × 1024 pixels, with a pixel size of 0.16 mm × 0.16 mm.
— type C card in Ln: areas where the limit value of 62 dB(A) is outdated on the facade of sensitive buildings (houses, establishments) education and health)
— LN: weighted noise discomfort index for the night period, i.e. (22h-6h)
— dB(A): unit of measurement expressing an intensity level (decibel) weighted according to the physiological characteristics of the human ear.
Sources: cee Méd2009 C_StDenis C_GlDGaulle_Location C_DDI_LN_62_Dept974_V6_finite C_nationals C_StDenis_ Tags C_departments C_COMMUNE_cartelie
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
INSPIRE Strategic noise maps. Noise agglomerations. Published by Environmental Protection Agency. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).These are the results of the noise mapping (round 3) of the Dublin and Cork Agglomerations carried out for the EPA under EU Directive 2002/49/EC. The directive is implemented in Ireland by the Environmental Noise Regulations 2006 (SI 140/2006)....
The dataset contains workplace noise measurement results collected during health hazards evaluation surveys from 1997 to 2013 for over 800 personal noise exposure assessments. The collected data about exposure are based on OSHA and NIOSH assessment criteria and are accompanied by description of location, industry, work type, working area, the activity that generates exposure, use of specific personal protective equipment as well as other variables.