Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is derived from the UrbanSound8K dataset, which contains 8,732 labeled sound clips across 10 urban sound classes. The original dataset has been processed to extract key audio features that are commonly used in machine learning and deep learning applications for sound classification.
8732 actual dataset 8674 this is what i have got so we have lost 58 sample on the way :(
We extracted 35 columns representing various audio characteristics. The target variable is the class column, which contains the sound category.
ClassID Class Name - 0 Air Conditioner - 1 Car Horn - 2 Children Playing - 3 Dog Bark - 4 Drilling - 5 Engine Idling - 6 Gun Shot - 7 Jackhammer - 8 Siren - 9 Street Music
This dataset is a processed version of UrbanSound8K. If you use it, please cite the original dataset:
📄 J. Salamon, C. Jacoby, and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research," 22nd ACM International Conference on Multimedia, Orlando, USA, Nov. 2014.
🔗 More Info: UrbanSound8K Website
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. For a detailed description of the dataset and how it was compiled please refer to our paper.
All excerpts are taken from field recordings uploaded to www.freesound.org. The files are pre-sorted into ten folds (folders named fold1-fold10) to help in the reproduction of and comparison with the automatic classification results reported in the article above.
In addition to the sound excerpts, a CSV file containing metadata about each excerpt is also provided.
8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).
UrbanSound8k.csv
This file contains meta-data information about every audio file in the dataset. This includes:
slice_file_name: The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav, where: [fsID] = the Freesound ID of the recording from which this excerpt (slice) is taken [classID] = a numeric identifier of the sound class (see description of classID below for further details) [occurrenceID] = a numeric identifier to distinguish different occurrences of the sound within the original recording [sliceID] = a numeric identifier to distinguish different slices taken from the same occurrence
fsID: The Freesound ID of the recording from which this excerpt (slice) is taken
start The start time of the slice in the original Freesound recording
end: The end time of slice in the original Freesound recording
salience: A (subjective) salience rating of the sound. 1 = foreground, 2 = background.
fold: The fold number (1-10) to which this file has been allocated.
classID: A numeric identifier of the sound class: 0 = air_conditioner 1 = car_horn 2 = children_playing 3 = dog_bark 4 = drilling 5 = engine_idling 6 = gun_shot 7 = jackhammer 8 = siren 9 = street_music
class: The class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music.
Since releasing the dataset we have noticed a couple of common mistakes that could invalidate your results, potentially leading to manuscripts being rejected or the publication of incorrect results. To avoid this, please read the following carefully:
Why? If you reshuffle the data (e.g. combine the data from all folds and generate a random train/test split) you will be incorrectly placing related samples in both the train and test sets, leading to inflated scores that don't represent your model's performance on unseen data. Put simply, your results will be wrong. Your results will NOT be comparable to previous results in the literature, meaning any claims to an improvement on previous research will be invalid. Even if you don't reshuffle the data, evaluating using different splits (e.g. 5-fold cross validation) will mean your results are not comparable to previous research.
Why? Not all the splits are as "easy". That is, models tend to obtain much higher scores when trained on folds 1-9 and tested on fold 10, compared to (e.g.) training on folds 2-10 and testing on fold 1. For this reason, it is important to evaluate your model on each of the 10 splits and report the average accuracy. Again, your results will NOT be comparable to previous results in the literature.
We kindly request that articles and other works in which this dataset is used cite the following paper:
J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
More information at https://urbansounddataset.weebly.com/urbansound8k.html
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Please visit urbansounddataset.weebly.com for a detailed description.
We kindly request that articles and other works in which this dataset is used cite the following paper:
J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
[ACM][PDF][BibTeX]
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains 8732 labeled sound excerpts melspectograms (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. All excerpts are taken from field recordings uploaded to www.freesound.org. This data set contains images of the sound in the UrbanSound8K data set.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
converted urban sound using mel-frequency ceptrum original dataset: https://www.kaggle.com/chrisfilo/urbansound8k all credits to the original authors
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Processing Details
The audio files from the original dataset have been converted to log-mel spectrograms using the following parameters:
Mel frequency bins: 128 (n_mels=128) Power to dB conversion: Using librosa.power_to_db() with reference to maximum power Library used: librosa for audio processing
Processing Pipeline
Load audio data from the original Urban Sounds 8K dataset Extract mel spectrograms using librosa.feature.melspectrogram() Convert to log scale using… See the full description on the dataset page: https://huggingface.co/datasets/EthanGLEdwards/urbansounds_melspectrograms.
Facebook
TwitterThis dataset was created by Raghav Rawat
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The IDMT-DESED-FL and IDMT-URBAN-FL datasets enable research in sound event detection (SED) within a federated learning (FL) context. IDMT-DESED-FL and IDMT-URBAN-FL consist of sound events sourced from well-known DESED and URBAN-8K datasets. Each source dataset contains sound events from ten classes for the use cases of SED in domestic and urban environments, respectively. To simulate an FL scenario, the source events are mixed with background noise to generate 30.000 ten-second soundscapes which are partitioned to 100 edge devices. Each soundscape is generated by mixing up to five sound events (possibly overlapping) with background noise. Both datasets contain independent and identically distributed (IID) and non-IID versions, to provide a more real-world like distribution of event classes.
Due to the size of the datasets, with this download we provide the scripts and details necessary to generate the FL datasets using the source material from DESED, URBAN-8K, and IUSB.
See the above referenced paper and README contained with the data folder for further details.
Facebook
TwitterTensorflow could not read in the original wave files so I re-wrote them locally and re-uploaded. For my own purposes (https://www.kaggle.com/tariqblecher/speech-enhancement-tensorflow-unet-softmask) I rewrote the data at a sampling rate of 8K.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is derived from the UrbanSound8K dataset, which contains 8,732 labeled sound clips across 10 urban sound classes. The original dataset has been processed to extract key audio features that are commonly used in machine learning and deep learning applications for sound classification.
8732 actual dataset 8674 this is what i have got so we have lost 58 sample on the way :(
We extracted 35 columns representing various audio characteristics. The target variable is the class column, which contains the sound category.
ClassID Class Name - 0 Air Conditioner - 1 Car Horn - 2 Children Playing - 3 Dog Bark - 4 Drilling - 5 Engine Idling - 6 Gun Shot - 7 Jackhammer - 8 Siren - 9 Street Music
This dataset is a processed version of UrbanSound8K. If you use it, please cite the original dataset:
📄 J. Salamon, C. Jacoby, and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research," 22nd ACM International Conference on Multimedia, Orlando, USA, Nov. 2014.
🔗 More Info: UrbanSound8K Website