https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Gopi Durgaprasad
Released under CC0: Public Domain
Description This dataset comprises embeddings and captions utilized as the development dataset for DCASE 2024 Challenge Task 7, focusing on 'Environmental Sound Scene Synthesis.' The embeddings are derived from 60 different 4-second audio files formatted as mono 32-bit 32kHz, and are contained in the 'embeddings.tar.xz' file. Captions corresponding to each audio file can be found in 'caption.csv'. This dataset does not comprise the audio files, only the embeddings. Three different types of embeddings are provided: VGGish (vggish), MS-CLAP (clap-2023), and PANNs CNN14 Wavegram-Logmel (panns-wavegram-logmel). Only PANNs CNN14 Wavegram-Logmel (panns-wavegram-logmel) embeddings are used for evaluation in the challenge. For further details, please refer to the challenge website. Contact Modan Tailleur, modan.tailleur@ls2n.fr Mathieu Lagrange, mathieu.lagrange@ls2n.fr
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This repository provides easy access to open-source soundscape datasets of bird sounds, specifically optimized for few-shot classification.
soundscapes.zip
contains evaluation soundscape datasets from the BIRB benchmark (https://arxiv.org/abs/2312.07439), downsampled to 16kHz, preprocessed using CNN14 from PANNs (https://arxiv.org/abs/1912.10211), to select a 6-second window with the highest bird activation, and converted to Pytorch (.pt) format to facilitate usability for evaluating deep neural networks.
These preprocessed datasets are employed in the work "Domain-Invariant Representation Learning of Bird Sounds" (https://arxiv.org/abs/2409.08589), which evaluates the few-shot learning capabilities of deep learning models trained on focal recordings (e.g., Xeno-Canto) and tested on soundscape recordings.
pow.pt
): The validation dataset consists of 16,047 examples across 43 classes and is organized as a dictionary with 'data'
and 'label'
keys representing bird sounds and their corresponding labels. Storing the entire validation dataset in a single tensor enables rapid loading and efficient processing, significantly accelerating the validation process. Classes with only one example are removed, as they are insufficient for one-shot classification tasks. Source: https://zenodo.org/records/4656848#.Y7ijhOxudhEEach test dataset is structured with multiple subfolders, each labeled with an eBird species code to represent data for a specific bird species.
ssw/
): Contains 50,760 examples across 96 classes. Source: https://zenodo.org/records/7079380#.Y7ijHOxudhEcoffee_farms/
): Contains 6,952 examples across 89 classes. Source: https://zenodo.org/records/7525349#.ZB8z_-xudhEhawaii/
): Contains 59,583 examples across 27 classes. Source: https://zenodo.org/records/7078499#.Y7ijPuxudhEhigh_sierras/
): Contains 10,296 examples across 19 classes. Source: https://zenodo.org/records/7525805#.ZB8zsexudhEsierras_kahl/
): Contains 20,147 examples across 56 classes. Source: https://zenodo.org/records/7050014#.Y7ijWexudhEperu/
): Contains 14,768 examples across 132 classes. Source: https://zenodo.org/records/7079124#.Y7iis-xudhECode and detailed instructions, including data loading, model implementation, and few-shot evaluation, can be found at: https://github.com/ilyassmoummad/ProtoCLR
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Gopi Durgaprasad
Released under CC0: Public Domain