8 datasets found
  1. H

    Data from: ESC: Dataset for Environmental Sound Classification

    • dataverse.harvard.edu
    application/x-gzip +1
    Updated Oct 18, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2015). ESC: Dataset for Environmental Sound Classification [Dataset]. http://doi.org/10.7910/DVN/YDEPUT
    Explore at:
    application/x-gzip(2219994), zip(47241189)Available download formats
    Dataset updated
    Oct 18, 2015
    Dataset provided by
    Harvard Dataverse
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    The ESC dataset is a collection of short environmental recordings available in a unified format (5-second-long clips, 44.1 kHz, single channel, Ogg Vorbis compressed @ 192 kbit/s). All clips have been extracted from public field recordings available through the Freesound.org project. Please see the README files for a detailed attribution list. The dataset is available under the terms of the Creative Commons license - Attribution-NonCommercial. The dataset consists of three parts: ESC-50: a labeled set of 2 000 environmental recordings (50 classes, 40 clips per class), ESC-10: a labeled set of 400 environmental recordings (10 classes, 40 clips per class) (this is a subset of ESC-50 - created initialy as a proof-of-concept/standardized selection of easy recordings), ESC-US: an unlabeled dataset of 250 000 environmental recordings (5-second-long clips), suitable for unsupervised pre-training. The ESC-US dataset, although not hand-annotated, includes the labels (tags) submitted by the original uploading users, which could be potentially used for weakly-supervised learning (noisy and/or missing labels). The ESC-10 and ESC-50 datasets have been prearranged into 5 uniformly sized folds so that clips extracted from the same original source recording are always contained in a single fold. The labeled datasets are also available as GitHub projects: ESC-50 | ESC-10. For a more thorough description and analysis, please see the original paper and the supplementary IPython notebook. The goal of this project is to facilitate open research initiatives in the field of environmental sound classification as publicly available datasets in this domain are still quite scarce. Acknowledgments I would like to thank Frederic Font Corbera for his help in using the Freesound API.

  2. a

    ESC-50

    • datasets.activeloop.ai
    • opendatalab.com
    • +1more
    deeplake
    Updated Oct 13, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karol Pizczak (2015). ESC-50 [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/esc-50-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Oct 13, 2015
    Authors
    Karol Pizczak
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Dataset funded by
    Google
    Description

    The ESC-50 dataset is a dataset of environmental sound recordings. The dataset consists of 50 classes of environmental sounds, each of which has 200 recordings. The recordings are of high quality, and they have been carefully labeled. The ESC-50 dataset has been used to train and evaluate a variety of environmental sound classification algorithms.

  3. h

    esc50

    • huggingface.co
    Updated Dec 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Wang (2017). esc50 [Dataset]. https://huggingface.co/datasets/yangwang825/esc50
    Explore at:
    Dataset updated
    Dec 16, 2017
    Authors
    Yang Wang
    Description

    ESC50

      Dataset Summary
    

    The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. It comprises 2000 5s-clips of 50 different classes across natural, human and domestic sounds, again, drawn from Freesound.org.

      Data Instances
    

    An example of 'train' looks as follows. { "audio": { "path": "ESC-50-master/audio/4-143118-B-7.wav", "array"… See the full description on the dataset page: https://huggingface.co/datasets/yangwang825/esc50.

  4. ESC-50-Voice: Dataset of vocal imitation for environmental sound in ESC-50

    • zenodo.org
    bin, csv
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita; Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita (2024). ESC-50-Voice: Dataset of vocal imitation for environmental sound in ESC-50 [Dataset]. http://doi.org/10.5281/zenodo.11385662
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita; Yuki Okamoto; Keisuke Imoto; Shinnosuke Takamichi; Ryotaro Nagase; Takahiro Fukumori; Yoichi Yamashita
    Description

    Description

    This is a dataset with vocal imitation, which involve the process of replicating or mimicking the rhythm and pitch of sounds by voice for an environmental sound in ESC-50 [1] that can be used in various tasks that use environmental sounds. The dataset consists of 9,920 vocal imitations (8 imitators per environmental sound). Each imitator is a Japanese speaker. All audio data are 48kHz/16bit wav files.

    Each audio file is named as follows:

    vocal_imitation/SpeakerID/FileName_SpeakerID.wav

    FileName means the original audio file name in ESC-50. SpaekerID means the ID of the imitator. We recorded vocal imitations for a part of sound events in ESC-50. A list of the sound events used can be obtained from EventList.csv.

    Note that this dataset does not contain environmental sound files, which can be obtained from ESC-50. Environmental sounds in ESC-50 are available here.

    Terms of use

    The materials may be used free of charge for research purposes, but please refrain from redistribution or use that is offensive to public order and morals. If you want to use for commercial purposes, please contact us (Yuki Okamoto or Keisuke Imoto).

    Citation

    If you use this dataset, please cite as follow:

    Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, and Yoichi Yamashita, "Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 411-415, 2024.

    Feedback

    If there is any problem, please contact us

    [1] K. J. Piczak, "Esc: Dataset for environmental sound classification,” in Proc. the 23rd ACM International Conference on Multimedia, 2015, p. 1015–1018.

  5. FSC22 Dataset

    • kaggle.com
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IRMIOT22 (2022). FSC22 Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/4213460
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    IRMIOT22
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Forest environmental sound classification is one use case of ESC which has been widely experimenting to identify illegal activities inside a forest. With the unavailability of public datasets specific to forest sounds, there is a requirement for a benchmark forest environment sound dataset. With this motivation, the FSC22 was created as a public benchmark dataset, using the audio samples collected from FreeSound org.

    This dataset includes 2025 labeled sound clips of 5s long. All the audio samples are distributed between six major parent-level classes; Mechanical sounds, Animal sounds, Environmental Sounds, Vehicle Sounds, Forest Threat Sounds, and Human Sounds. Further, each class is divided into subclasses that capture specific sounds which fall under the main category. Overall the dataset taxonomy consists of 34 classes as shown below. For the first phase of the dataset creation, 75 audio samples for every 27 classes were collected.

    We expect that this dataset will help research communities with their research work governing Forest Acoustic monitoring and classification domain.

  6. ESC50_Adaf_Spectrograms_128(Power)

    • kaggle.com
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dingyan Chen (2025). ESC50_Adaf_Spectrograms_128(Power) [Dataset]. https://www.kaggle.com/datasets/dingyan0418/esc50-adaf-spectrograms-128power
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dingyan Chen
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 2000 spectrogram images converted from the ESC50 audio dataset, which includes 50 categories of environmental sounds such as dog barking, thunder, and clock ticking.

    These images are created using the Adaf-Spectrogram method, an adaptive frequency-axis spectrogram representation proposed to improve deep learning performance in classification tasks.

    For more information, visit the Adaf-Spectrogram project.

    đź”§ Spectrogram Generation

    All audio clips were processed using the Short-Time Fourier Transform (STFT) to generate time–frequency representations. The transformation was implemented using the scipy library, specifically the scipy.signal module.

    • Sampling rate: 44,100 Hz
    • Window length (nperseg): 2048 samples
    • Overlap (noverlap): 1536 samples, which corresponds to a 75% overlap
    • Adaptive band: Adaptive segmentation into N_sec = 128 sections, based on frequency-wise energy distribution
    • Colormap: Viridis
    • Spectrogram type: Power spectrogram computed as np.abs(spectrogram)**2(not in decibel or amplitude scale)

    After conversion, each spectrogram image was resized to 128Ă—128 pixels using high-quality resampling (LANCZOS filter) via the Pillow (PIL) library.

  7. m

    SAISE_Dataset

    • data.mendeley.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristia Bautista (2025). SAISE_Dataset [Dataset]. http://doi.org/10.17632/39g2tmxwcf.1
    Explore at:
    Dataset updated
    May 13, 2025
    Authors
    Cristia Bautista
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the article titled "A tinyML for risk identification for people with hearing loss" and was developed to train an embedded system called SSIES (Support System for Identifying Environmental Sounds). The goal of the system is to assist individuals with hearing loss by identifying environmental sounds that signal emergencies or require immediate attention. The dataset is primarily based on selected audio samples from the ESC-50 database, including siren, horn, and baby cry sounds. An additional class, “scream,” was sourced from Freesound.org and self-recordings, as it was not included in the original ESC-50 set. To minimize false positives, a fifth class called “X” was added, consisting of common non-emergency environmental sounds. The audio files were processed to match the constraints of an embedded environment by downsampling to 16 KHz, trimming to 1–1.5 seconds, and applying data augmentation techniques (pitch and speed variations) using the librosa library. This optimized dataset enables the training of lightweight tinyML models capable of real-time emergency sound recognition.

  8. audio-source-separation

    • kaggle.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    freecolabgpu (2025). audio-source-separation [Dataset]. https://www.kaggle.com/datasets/freecolabgpu/audio-source-separation/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    freecolabgpu
    Description

    Dataset Details: Audio Source Separation

    The dataset for audio source separation is created by combining four different datasets, ensuring diverse and representative audio classes.

    Dataset Composition

    •  Individual audio sources were extracted for each class.
    •  These sources were mixed in all possible combinations to generate 1,000 mixed WAV files.
    •  Each mixed file is accompanied by its corresponding true source signals.
    

    Source Datasets

    1. Speech – LibriVox
    2. Music – MUSDB18
    3. Environmental Sounds – ESC-50
    4. Traffic Sounds – UrbanSound8K
    

    This dataset is designed to support research in audio source separation, machine learning, and signal processing.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Harvard Dataverse (2015). ESC: Dataset for Environmental Sound Classification [Dataset]. http://doi.org/10.7910/DVN/YDEPUT

Data from: ESC: Dataset for Environmental Sound Classification

Related Article
Explore at:
39 scholarly articles cite this dataset (View in Google Scholar)
application/x-gzip(2219994), zip(47241189)Available download formats
Dataset updated
Oct 18, 2015
Dataset provided by
Harvard Dataverse
License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

The ESC dataset is a collection of short environmental recordings available in a unified format (5-second-long clips, 44.1 kHz, single channel, Ogg Vorbis compressed @ 192 kbit/s). All clips have been extracted from public field recordings available through the Freesound.org project. Please see the README files for a detailed attribution list. The dataset is available under the terms of the Creative Commons license - Attribution-NonCommercial. The dataset consists of three parts: ESC-50: a labeled set of 2 000 environmental recordings (50 classes, 40 clips per class), ESC-10: a labeled set of 400 environmental recordings (10 classes, 40 clips per class) (this is a subset of ESC-50 - created initialy as a proof-of-concept/standardized selection of easy recordings), ESC-US: an unlabeled dataset of 250 000 environmental recordings (5-second-long clips), suitable for unsupervised pre-training. The ESC-US dataset, although not hand-annotated, includes the labels (tags) submitted by the original uploading users, which could be potentially used for weakly-supervised learning (noisy and/or missing labels). The ESC-10 and ESC-50 datasets have been prearranged into 5 uniformly sized folds so that clips extracted from the same original source recording are always contained in a single fold. The labeled datasets are also available as GitHub projects: ESC-50 | ESC-10. For a more thorough description and analysis, please see the original paper and the supplementary IPython notebook. The goal of this project is to facilitate open research initiatives in the field of environmental sound classification as publicly available datasets in this domain are still quite scarce. Acknowledgments I would like to thank Frederic Font Corbera for his help in using the Freesound API.

Search
Clear search
Close search
Google apps
Main menu