8 datasets found

H
Data from: ESC: Dataset for Environmental Sound Classification
dataverse.harvard.edu
application/x-gzip +1
Updated Oct 18, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2015). ESC: Dataset for Environmental Sound Classification [Dataset]. http://doi.org/10.7910/DVN/YDEPUT
Explore at:
application/x-gzip(2219994), zip(47241189)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/YDEPUT
Dataset updated
Oct 18, 2015
Dataset provided by
Harvard Dataverse
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
The ESC dataset is a collection of short environmental recordings available in a unified format (5-second-long clips, 44.1 kHz, single channel, Ogg Vorbis compressed @ 192 kbit/s). All clips have been extracted from public field recordings available through the Freesound.org project. Please see the README files for a detailed attribution list. The dataset is available under the terms of the Creative Commons license - Attribution-NonCommercial. The dataset consists of three parts: ESC-50: a labeled set of 2 000 environmental recordings (50 classes, 40 clips per class), ESC-10: a labeled set of 400 environmental recordings (10 classes, 40 clips per class) (this is a subset of ESC-50 - created initialy as a proof-of-concept/standardized selection of easy recordings), ESC-US: an unlabeled dataset of 250 000 environmental recordings (5-second-long clips), suitable for unsupervised pre-training. The ESC-US dataset, although not hand-annotated, includes the labels (tags) submitted by the original uploading users, which could be potentially used for weakly-supervised learning (noisy and/or missing labels). The ESC-10 and ESC-50 datasets have been prearranged into 5 uniformly sized folds so that clips extracted from the same original source recording are always contained in a single fold. The labeled datasets are also available as GitHub projects: ESC-50 | ESC-10. For a more thorough description and analysis, please see the original paper and the supplementary IPython notebook. The goal of this project is to facilitate open research initiatives in the field of environmental sound classification as publicly available datasets in this domain are still quite scarce. Acknowledgments I would like to thank Frederic Font Corbera for his help in using the Freesound API.
a
ESC-50
datasets.activeloop.ai
opendatalab.com
+1more
deeplake
Updated Oct 13, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karol Pizczak (2015). ESC-50 [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/esc-50-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Oct 13, 2015
Authors
Karol Pizczak
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Dataset funded by
Google
Description
The ESC-50 dataset is a dataset of environmental sound recordings. The dataset consists of 50 classes of environmental sounds, each of which has 200 recordings. The recordings are of high quality, and they have been carefully labeled. The ESC-50 dataset has been used to train and evaluate a variety of environmental sound classification algorithms.
h
esc50
huggingface.co
Updated Dec 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Wang (2017). esc50 [Dataset]. https://huggingface.co/datasets/yangwang825/esc50
Explore at:
Dataset updated
Dec 16, 2017
Authors
Yang Wang
Description
ESC50

Dataset Summary

The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. It comprises 2000 5s-clips of 50 different classes across natural, human and domestic sounds, again, drawn from Freesound.org.

Data Instances

An example of 'train' looks as follows. { "audio": { "path": "ESC-50-master/audio/4-143118-B-7.wav", "array"… See the full description on the dataset page: https://huggingface.co/datasets/yangwang825/esc50.
ESC-50-Voice: Dataset of vocal imitation for environmental sound in ESC-50
zenodo.org
bin, csv
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita; Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita (2024). ESC-50-Voice: Dataset of vocal imitation for environmental sound in ESC-50 [Dataset]. http://doi.org/10.5281/zenodo.11385662
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11385662
Dataset updated
Jun 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita; Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita
Description
Description

This is a dataset with vocal imitation, which involve the process of replicating or mimicking the rhythm and pitch of sounds by voice for an environmental sound in ESC-50 [1] that can be used in various tasks that use environmental sounds. The dataset consists of 9,920 vocal imitations (8 imitators per environmental sound). Each imitator is a Japanese speaker. All audio data are 48kHz/16bit wav files.

Each audio file is named as follows:

vocal_imitation/SpeakerID/FileName_SpeakerID.wav

FileName means the original audio file name in ESC-50. SpaekerID means the ID of the imitator. We recorded vocal imitations for a part of sound events in ESC-50. A list of the sound events used can be obtained from EventList.csv.

Note that this dataset does not contain environmental sound files, which can be obtained from ESC-50. Environmental sounds in ESC-50 are available here.

Terms of use

The materials may be used free of charge for research purposes, but please refrain from redistribution or use that is offensive to public order and morals. If you want to use for commercial purposes, please contact us (Yuki Okamoto or Keisuke Imoto).

Citation

If you use this dataset, please cite as follow:

Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, and Yoichi Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 411-415, 2024.

Feedback

If there is any problem, please contact us

Yuki Okamoto, y-okamoto@ieee.org

Keisuke Imoto, keisuke.imoto@ieee.org

[1] K. J. Piczak, "Esc: Dataset for environmental sound classification,” in Proc. the 23rd ACM International Conference on Multimedia, 2015, p. 1015–1018.
FSC22 Dataset
kaggle.com
Updated Sep 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IRMIOT22 (2022). FSC22 Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/4213460
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4213460
Dataset updated
Sep 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
IRMIOT22
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Forest environmental sound classification is one use case of ESC which has been widely experimenting to identify illegal activities inside a forest. With the unavailability of public datasets specific to forest sounds, there is a requirement for a benchmark forest environment sound dataset. With this motivation, the FSC22 was created as a public benchmark dataset, using the audio samples collected from FreeSound org.

This dataset includes 2025 labeled sound clips of 5s long. All the audio samples are distributed between six major parent-level classes; Mechanical sounds, Animal sounds, Environmental Sounds, Vehicle Sounds, Forest Threat Sounds, and Human Sounds. Further, each class is divided into subclasses that capture specific sounds which fall under the main category. Overall the dataset taxonomy consists of 34 classes as shown below. For the first phase of the dataset creation, 75 audio samples for every 27 classes were collected.

We expect that this dataset will help research communities with their research work governing Forest Acoustic monitoring and classification domain.
ESC50_Adaf_Spectrograms_128(Power)
kaggle.com
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dingyan Chen (2025). ESC50_Adaf_Spectrograms_128(Power) [Dataset]. https://www.kaggle.com/datasets/dingyan0418/esc50-adaf-spectrograms-128power
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dingyan Chen
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains 2000 spectrogram images converted from the ESC50 audio dataset, which includes 50 categories of environmental sounds such as dog barking, thunder, and clock ticking.

These images are created using the Adaf-Spectrogram method, an adaptive frequency-axis spectrogram representation proposed to improve deep learning performance in classification tasks.

For more information, visit the Adaf-Spectrogram project.

🔧 Spectrogram Generation

All audio clips were processed using the Short-Time Fourier Transform (STFT) to generate time–frequency representations. The transformation was implemented using the scipy library, specifically the scipy.signal module.

Sampling rate: 44,100 Hz

Window length (nperseg): 2048 samples

Overlap (noverlap): 1536 samples, which corresponds to a 75% overlap

Adaptive band: Adaptive segmentation into N_sec = 128 sections, based on frequency-wise energy distribution

Colormap: Viridis

Spectrogram type: Power spectrogram computed as np.abs(spectrogram)**2(not in decibel or amplitude scale)

After conversion, each spectrogram image was resized to 128×128 pixels using high-quality resampling (LANCZOS filter) via the Pillow (PIL) library.
m
SAISE_Dataset
data.mendeley.com
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristia Bautista (2025). SAISE_Dataset [Dataset]. http://doi.org/10.17632/39g2tmxwcf.1
Explore at:
Unique identifier
https://doi.org/10.17632/39g2tmxwcf.1
Dataset updated
May 13, 2025
Authors
Cristia Bautista
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of the article titled "A tinyML for risk identification for people with hearing loss" and was developed to train an embedded system called SSIES (Support System for Identifying Environmental Sounds). The goal of the system is to assist individuals with hearing loss by identifying environmental sounds that signal emergencies or require immediate attention. The dataset is primarily based on selected audio samples from the ESC-50 database, including siren, horn, and baby cry sounds. An additional class, “scream,” was sourced from Freesound.org and self-recordings, as it was not included in the original ESC-50 set. To minimize false positives, a fifth class called “X” was added, consisting of common non-emergency environmental sounds. The audio files were processed to match the constraints of an embedded environment by downsampling to 16 KHz, trimming to 1–1.5 seconds, and applying data augmentation techniques (pitch and speed variations) using the librosa library. This optimized dataset enables the training of lightweight tinyML models capable of real-time emergency sound recognition.
audio-source-separation
kaggle.com
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
freecolabgpu (2025). audio-source-separation [Dataset]. https://www.kaggle.com/datasets/freecolabgpu/audio-source-separation/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
freecolabgpu
Description
Dataset Details: Audio Source Separation

The dataset for audio source separation is created by combining four different datasets, ensuring diverse and representative audio classes.

Dataset Composition

• Individual audio sources were extracted for each class. • These sources were mixed in all possible combinations to generate 1,000 mixed WAV files. • Each mixed file is accompanied by its corresponding true source signals.

Source Datasets

1. Speech – LibriVox 2. Music – MUSDB18 3. Environmental Sounds – ESC-50 4. Traffic Sounds – UrbanSound8K

This dataset is designed to support research in audio source separation, machine learning, and signal processing.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Harvard Dataverse (2015). ESC: Dataset for Environmental Sound Classification [Dataset]. http://doi.org/10.7910/DVN/YDEPUT

Data from: ESC: Dataset for Environmental Sound Classification

Explore at:

39 scholarly articles cite this dataset (View in Google Scholar)

application/x-gzip(2219994), zip(47241189)Available download formats

Unique identifier

https://doi.org/10.7910/DVN/YDEPUT

Dataset updated

Oct 18, 2015

Dataset provided by

Harvard Dataverse

License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

The ESC dataset is a collection of short environmental recordings available in a unified format (5-second-long clips, 44.1 kHz, single channel, Ogg Vorbis compressed @ 192 kbit/s). All clips have been extracted from public field recordings available through the Freesound.org project. Please see the README files for a detailed attribution list. The dataset is available under the terms of the Creative Commons license - Attribution-NonCommercial. The dataset consists of three parts: ESC-50: a labeled set of 2 000 environmental recordings (50 classes, 40 clips per class), ESC-10: a labeled set of 400 environmental recordings (10 classes, 40 clips per class) (this is a subset of ESC-50 - created initialy as a proof-of-concept/standardized selection of easy recordings), ESC-US: an unlabeled dataset of 250 000 environmental recordings (5-second-long clips), suitable for unsupervised pre-training. The ESC-US dataset, although not hand-annotated, includes the labels (tags) submitted by the original uploading users, which could be potentially used for weakly-supervised learning (noisy and/or missing labels). The ESC-10 and ESC-50 datasets have been prearranged into 5 uniformly sized folds so that clips extracted from the same original source recording are always contained in a single fold. The labeled datasets are also available as GitHub projects: ESC-50 | ESC-10. For a more thorough description and analysis, please see the original paper and the supplementary IPython notebook. The goal of this project is to facilitate open research initiatives in the field of environmental sound classification as publicly available datasets in this domain are still quite scarce. Acknowledgments I would like to thank Frederic Font Corbera for his help in using the Freesound API.

Clear search

Close search

Google apps

Main menu

Data from: ESC: Dataset for Environmental Sound Classification

ESC-50

esc50

ESC-50-Voice: Dataset of vocal imitation for environmental sound in ESC-50

Description

Terms of use

Citation

Feedback

FSC22 Dataset

ESC50_Adaf_Spectrograms_128(Power)

SAISE_Dataset

audio-source-separation

Data from: ESC: Dataset for Environmental Sound Classification