3 datasets found

stage1-Cnn14-16k-valid2-weights
kaggle.com
Updated Oct 13, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gopi Durgaprasad (2020). stage1-Cnn14-16k-valid2-weights [Dataset]. https://www.kaggle.com/gopidurgaprasad/stage1-cnn14-16k-valid2-weights/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 13, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gopi Durgaprasad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Gopi Durgaprasad

Released under CC0: Public Domain

Contents
o
Data from: DCASE 2024 Challenge Task 7 Development Dataset : Environmental...
explore.openaire.eu
zenodo.org
Updated Mar 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keunwoo Choi; Laurie M. Heller; Keisuke Imoto; Mathieu Lagrange; Junwon Lee; Brian McFee; Yuki Okamoto; Modan Tailleur (2024). DCASE 2024 Challenge Task 7 Development Dataset : Environmental Sound Scene Synthesis [Dataset]. http://doi.org/10.5281/zenodo.10869643
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10869643
Dataset updated
Mar 25, 2024
Authors
Keunwoo Choi; Laurie M. Heller; Keisuke Imoto; Mathieu Lagrange; Junwon Lee; Brian McFee; Yuki Okamoto; Modan Tailleur
Description
Description This dataset comprises embeddings and captions utilized as the development dataset for DCASE 2024 Challenge Task 7, focusing on 'Environmental Sound Scene Synthesis.' The embeddings are derived from 60 different 4-second audio files formatted as mono 32-bit 32kHz, and are contained in the 'embeddings.tar.xz' file. Captions corresponding to each audio file can be found in 'caption.csv'. This dataset does not comprise the audio files, only the embeddings. Three different types of embeddings are provided: VGGish (vggish), MS-CLAP (clap-2023), and PANNs CNN14 Wavegram-Logmel (panns-wavegram-logmel). Only PANNs CNN14 Wavegram-Logmel (panns-wavegram-logmel) embeddings are used for evaluation in the challenge. For further details, please refer to the challenge website. Contact Modan Tailleur, modan.tailleur@ls2n.fr Mathieu Lagrange, mathieu.lagrange@ls2n.fr
Soundscape Datasets for Few-Shot Bird Sound Classification
zenodo.org
zip
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moummad Ilyass; Moummad Ilyass (2024). Soundscape Datasets for Few-Shot Bird Sound Classification [Dataset]. http://doi.org/10.5281/zenodo.13994373
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13994373
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Moummad Ilyass; Moummad Ilyass
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This repository provides easy access to open-source soundscape datasets of bird sounds, specifically optimized for few-shot classification.

soundscapes.zip contains evaluation soundscape datasets from the BIRB benchmark (https://arxiv.org/abs/2312.07439), downsampled to 16kHz, preprocessed using CNN14 from PANNs (https://arxiv.org/abs/1912.10211), to select a 6-second window with the highest bird activation, and converted to Pytorch (.pt) format to facilitate usability for evaluating deep neural networks.

These preprocessed datasets are employed in the work "Domain-Invariant Representation Learning of Bird Sounds" (https://arxiv.org/abs/2409.08589), which evaluates the few-shot learning capabilities of deep learning models trained on focal recordings (e.g., Xeno-Canto) and tested on soundscape recordings.

Dataset Structure

Validation Dataset

POW (pow.pt): The validation dataset consists of 16,047 examples across 43 classes and is organized as a dictionary with 'data' and 'label' keys representing bird sounds and their corresponding labels. Storing the entire validation dataset in a single tensor enables rapid loading and efficient processing, significantly accelerating the validation process. Classes with only one example are removed, as they are insufficient for one-shot classification tasks. Source: https://zenodo.org/records/4656848#.Y7ijhOxudhE

Test Datasets

Each test dataset is structured with multiple subfolders, each labeled with an eBird species code to represent data for a specific bird species.

SSW (ssw/): Contains 50,760 examples across 96 classes. Source: https://zenodo.org/records/7079380#.Y7ijHOxudhE

NES (coffee_farms/): Contains 6,952 examples across 89 classes. Source: https://zenodo.org/records/7525349#.ZB8z_-xudhE

UHH (hawaii/): Contains 59,583 examples across 27 classes. Source: https://zenodo.org/records/7078499#.Y7ijPuxudhE

HSN (high_sierras/): Contains 10,296 examples across 19 classes. Source: https://zenodo.org/records/7525805#.ZB8zsexudhE

SNE (sierras_kahl/): Contains 20,147 examples across 56 classes. Source: https://zenodo.org/records/7050014#.Y7ijWexudhE

PER (peru/): Contains 14,768 examples across 132 classes. Source: https://zenodo.org/records/7079124#.Y7iis-xudhE

Code and detailed instructions, including data loading, model implementation, and few-shot evaluation, can be found at: https://github.com/ilyassmoummad/ProtoCLR
Not seeing a result you expected?
Learn how you can add new datasets to our index.