3 datasets found
  1. Z

    FSDKaggle2019

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca (2020). FSDKaggle2019 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3612636
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Xavier Serra
    Eduardo Fonseca
    Frederic Font
    Manoj Plakal
    Daniel P. W. Ellis
    Description

    FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019.

    Citation

    If you use the FSDKaggle2019 dataset or part of it, please cite our DCASE 2019 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra. "Audio tagging with noisy labels and minimal supervision". Proceedings of the DCASE 2019 Workshop, NYC, US (2019)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2019.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Data curators

    Eduardo Fonseca, Manoj Plakal, Xavier Favory, Jordi Pons

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    ABOUT FSDKaggle2019

    Freesound Dataset Kaggle 2019 (or FSDKaggle2019 for short) is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology [1]. FSDKaggle2019 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. Please visit the DCASE2019 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound Audio Tagging 2019. It was organized by researchers from the Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF), and from Sound Understanding team at Google AI Perception. The competition intended to provide insight towards the development of broadly-applicable sound event classifiers able to cope with label noise and minimal supervision conditions.

    FSDKaggle2019 employs audio clips from the following sources:

    Freesound Dataset (FSD): a dataset being collected at the MTG-UPF based on Freesound content organized with the AudioSet Ontology

    The soundtracks of a pool of Flickr videos taken from the Yahoo Flickr Creative Commons 100M dataset (YFCC)

    The audio data is labeled using a vocabulary of 80 labels from Google’s AudioSet Ontology [1], covering diverse topics: Guitar and other Musical Instruments, Percussion, Water, Digestive, Respiratory sounds, Human voice, Human locomotion, Hands, Human group actions, Insect, Domestic animals, Glass, Liquid, Motor vehicle (road), Mechanisms, Doors, and a variety of Domestic sounds. The full list of categories can be inspected in vocabulary.csv (see Files & Download below). The goal of the task was to build a multi-label audio tagging system that can predict appropriate label(s) for each audio clip in a test set.

    What follows is a summary of some of the most relevant characteristics of FSDKaggle2019. Nevertheless, it is highly recommended to read our DCASE 2019 paper for a more in-depth description of the dataset and how it was built.

    Ground Truth Labels

    The ground truth labels are provided at the clip-level, and express the presence of a sound category in the audio clip, hence can be considered weak labels or tags. Audio clips have variable lengths (roughly from 0.3 to 30s).

    The audio content from FSD has been manually labeled by humans following a data labeling process using the Freesound Annotator platform. Most labels have inter-annotator agreement but not all of them. More details about the data labeling process and the Freesound Annotator can be found in [2].

    The YFCC soundtracks were labeled using automated heuristics applied to the audio content and metadata of the original Flickr clips. Hence, a substantial amount of label noise can be expected. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises. More information about some of the types of label noise that can be encountered is available in [3].

    Specifically, FSDKaggle2019 features three types of label quality, one for each set in the dataset:

    curated train set: correct (but potentially incomplete) labels

    noisy train set: noisy labels

    test set: correct and complete labels

    Further details can be found below in the sections for each set.

    Format

    All audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.

    DATA SPLIT

    FSDKaggle2019 consists of two train sets and one test set. The idea is to limit the supervision provided for training (i.e., the manually-labeled, hence reliable, data), thus promoting approaches to deal with label noise.

    Curated train set

    The curated train set consists of manually-labeled data from FSD.

    Number of clips/class: 75 except in a few cases (where there are less)

    Total number of clips: 4970

    Avg number of labels/clip: 1.2

    Total duration: 10.5 hours

    The duration of the audio clips ranges from 0.3 to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording/uploading sounds. Labels are correct but potentially incomplete. It can happen that a few of these audio clips present additional acoustic material beyond the provided ground truth label(s).

    Noisy train set

    The noisy train set is a larger set of noisy web audio data from Flickr videos taken from the YFCC dataset [5].

    Number of clips/class: 300

    Total number of clips: 19,815

    Avg number of labels/clip: 1.2

    Total duration: ~80 hours

    The duration of the audio clips ranges from 1s to 15s, with the vast majority lasting 15s. Labels are automatically generated and purposefully noisy. No human validation is involved. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises.

    Considering the numbers above, the per-class data distribution available for training is, for most of the classes, 300 clips from the noisy train set and 75 clips from the curated train set. This means 80% noisy / 20% curated at the clip level, while at the duration level the proportion is more extreme considering the variable-length clips.

    Test set

    The test set is used for system evaluation and consists of manually-labeled data from FSD.

    Number of clips/class: between 50 and 150

    Total number of clips: 4481

    Avg number of labels/clip: 1.4

    Total duration: 12.9 hours

    The acoustic material present in the test set clips is labeled exhaustively using the aforementioned vocabulary of 80 classes. Most labels have inter-annotator agreement but not all of them. Except human error, the label(s) are correct and complete considering the target vocabulary; nonetheless, a few clips could still present additional (unlabeled) acoustic content out of the vocabulary.

    During the DCASE2019 Challenge Task 2, the test set was split into two subsets, for the public and private leaderboards, and only the data corresponding to the public leaderboard was provided. In this current package you will find the full test set with all the test labels. To allow comparison with previous work, the file test_post_competition.csv includes a flag to determine the corresponding leaderboard (public or private) for each test clip (see more info in Files & Download below).

    Acoustic mismatch

    As mentioned before, FSDKaggle2019 uses audio clips from two sources:

    FSD: curated train set and test set, and

    YFCC: noisy train set.

    While the sources of audio (Freesound and Flickr) are collaboratively contributed and pretty diverse themselves, a certain acoustic mismatch can be expected between FSD and YFCC. We conjecture this mismatch comes from a variety of reasons. For example, through acoustic inspection of a small sample of both data sources, we find a higher percentage of high quality recordings in FSD. In addition, audio clips in Freesound are typically recorded with the purpose of capturing audio, which is not necessarily the case in YFCC.

    This mismatch can have an impact in the evaluation, considering that most of the train data come from YFCC, while all test data are drawn from FSD. This constraint (i.e., noisy training data coming from a different web audio source than the test set) is sometimes a real-world condition.

    LICENSE

    All clips in FSDKaggle2019 are released under Creative Commons (CC) licenses. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses.

    Curated train set and test set. All clips in Freesound are released under different modalities of Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. The licenses are specified in the files train_curated_post_competition.csv and test_post_competition.csv. These licenses can be CC0, CC-BY, CC-BY-NC and CC Sampling+.

    Noisy train set. Similarly, the licenses of the soundtracks from Flickr used in FSDKaggle2019 are specified in the file train_noisy_post_competition.csv. These licenses can be CC-BY and CC BY-SA.

    In addition, FSDKaggle2019 as a whole is the result of a curation process and it has an additional license. FSDKaggle2019 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2019.doc zip file.

    FILES & DOWNLOAD

    FSDKaggle2019 can be downloaded as a series of zip files with the following directory structure:

    root │
    └───FSDKaggle2019.audio_train_curated/ Audio clips in the curated train set │ └───FSDKaggle2019.audio_train_noisy/ Audio clips in the noisy

  2. Z

    ARCA23K

    • data.niaid.nih.gov
    • paperswithcode.com
    • +1more
    Updated Feb 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cao, Yin (2022). ARCA23K [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5117900
    Explore at:
    Dataset updated
    Feb 25, 2022
    Dataset provided by
    Iqbal, Turab
    Bailey, Andrew
    Wang, Wenwu
    Cao, Yin
    Plumbley, Mark D.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ARCA23K is a dataset of labelled sound events created to investigate real-world label noise. It contains 23,727 audio clips originating from Freesound, and each clip belongs to one of 70 classes taken from the AudioSet ontology. The dataset was created using an entirely automated process with no manual verification of the data. For this reason, many clips are expected to be labelled incorrectly.

    In addition to ARCA23K, this release includes a companion dataset called ARCA23K-FSD, which is a single-label subset of the FSD50K dataset. ARCA23K-FSD contains the same sound classes as ARCA23K and the same number of audio clips per class. As it is a subset of FSD50K, each clip and its label have been manually verified. Note that only the ground truth data of ARCA23K-FSD is distributed in this release. To download the audio clips, please visit the Zenodo page for FSD50K.

    A paper has been published detailing how the dataset was constructed. See the Citing section below.

    The source code used to create the datasets is available: https://github.com/tqbl/arca23k-dataset

    Characteristics

    ARCA23K(-FSD) is divided into:

    A training set containing 17,979 clips (39.6 hours for ARCA23K).

    A validation set containing 2,264 clips (5.0 hours).

    A test test containing 3,484 clips (7.3 hours).

    There are 70 sound classes in total. Each class belongs to the AudioSet ontology.

    Each audio clip was sourced from the Freesound database. Other than format conversions (e.g. resampling), the audio clips have not been modified.

    The duration of the audio clips varies from 0.3 seconds to 30 seconds.

    All audio clips are mono 16-bit WAV files sampled at 44.1 kHz.

    Based on listening tests (details in paper), 46.4% of the training examples are estimated to be labelled incorrectly. Among the incorrectly-labelled examples, 75.9% are estimated to be out-of-vocabulary.

    Sound Classes

    The list of sound classes is given below. They are grouped based on the top-level superclasses of the AudioSet ontology.

    Music

    Acoustic guitar

    Bass guitar

    Bowed string instrument

    Crash cymbal

    Electric guitar

    Gong

    Harp

    Organ

    Piano

    Rattle (instrument)

    Scratching (performance technique)

    Snare drum

    Trumpet

    Wind chime

    Wind instrument, woodwind instrument

    Sounds of things

    Boom

    Camera

    Coin (dropping)

    Computer keyboard

    Crack

    Dishes, pots, and pans

    Drawer open or close

    Drill

    Gunshot, gunfire

    Hammer

    Keys jangling

    Knock

    Microwave oven

    Printer

    Sawing

    Scissors

    Skateboard

    Slam

    Splash, splatter

    Squeak

    Tap

    Thump, thud

    Toilet flush

    Train

    Water tap, faucet

    Whoosh, swoosh, swish

    Writing

    Zipper (clothing)

    Natural sounds

    Crackle

    Stream

    Waves, surf

    Wind

    Human sounds

    Burping, eructation

    Chewing, mastication

    Child speech, kid speaking

    Clapping

    Cough

    Crying, sobbing

    Fart

    Female singing

    Female speech, woman speaking

    Finger snapping

    Giggle

    Male speech, man speaking

    Run

    Screaming

    Walk, footsteps

    Animal

    Bark

    Cricket

    Livestock, farm animals, working animals

    Meow

    Rattle

    Source-ambiguous sounds

    Crumpling, crinkling

    Crushing

    Tearing

    License and Attribution

    This release is licensed under the Creative Commons Attribution 4.0 International License.

    The audio clips distributed as part of ARCA23K were sourced from Freesound and have their own Creative Commons license. The license information and attribution for each audio clip can be found in ARCA23K.metadata/train.json, which also includes the original Freesound URLs.

    The files under ARCA23K-FSD.ground_truth/ are an adaptation of the ground truth data provided as part of FSD50K, which is licensed under the Creative Commons Attribution 4.0 International License. The curators of FSD50K are Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano, and Sara Fernandez.

    Citing

    If you wish to cite this work, please cite the following paper:

    T. Iqbal, Y. Cao, A. Bailey, M. D. Plumbley, and W. Wang, “ARCA23K: An audio dataset for investigating open-set label noise”, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 2021, Barcelona, Spain, pp. 201–205.

    BibTeX:

    @inproceedings{Iqbal2021, author = {Iqbal, T. and Cao, Y. and Bailey, A. and Plumbley, M. D. and Wang, W.}, title = {{ARCA23K}: An audio dataset for investigating open-set label noise}, booktitle = {Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)}, pages = {201--205}, year = {2021}, address = {Barcelona, Spain}, }

  3. Z

    AdA Filmontology Annotation Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buzal, Anton (2023). AdA Filmontology Annotation Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8328662
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Scherer, Thomas
    Grotkopp, Matthias
    Pedro Prado, João
    Pfeilschifter, Yvonne
    Agt-Rickauer, Henning
    Bakels, Jan-Hendrik
    Stratil, Jasper
    Zorko, Rebecca
    Buzal, Anton
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    AdA Project Public Data Release

    This repository holds public data provided by the AdA project (Affektrhetoriken des Audiovisuellen - BMBF eHumanities Research Group Audio-Visual Rhetorics of Affect).

    See: http://www.ada.cinepoetics.fu-berlin.de/en/index.html The data is made accessible under the terms of the Creative Commons Attribution-ShareAlike 3.0 License. The data can be accessed also at the project's public GitHub repository: https://github.com/ProjectAdA/public

    Further explanations of the data can be found on our AdA project website: https://projectada.github.io/. See also the peer-reviewed data paper for this dataset that is in review to be published in NECSUS_European Journal of Media Studies, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    The data currently includes:

    AdA Filmontology

    The latest public release of the AdA Filmontology: https://github.com/ProjectAdA/public/tree/master/ontology

    A vocabulary of film-analytical terms and concepts for fine-grained semantic video annotation.

    The vocabulary is also available online in our triplestore: https://ada.cinepoetics.org/resource/2021/05/19/eMAEXannotationMethod.html

    Advene Annotation Template

    The latest public release of the template for the Advene annotation software: https://github.com/ProjectAdA/public/tree/master/advene_template

    The template provides the developed semantic vocabulary in the Advene software with ready-to-use annotation tracks and predefined values.

    In order to use the template you have to install and use Advene: https://www.advene.org/

    Annotation Data

    The latest public releases of our annotation datasets based on the AdA vocabulary: https://github.com/ProjectAdA/public/tree/master/annotations

    The dataset of news reports, documentaries and feature films on the topic of "financial crisis" contains more than 92.000 manual & semi-automatic annotations authored in the open-source-software Advene (Aubert/Prié 2005) by expert annotators as well as more than 400.000 automatically generated annotations for wider corpus exploration. The annotations are published as Linked Open Data under the CC BY-SA 3.0 licence and available as rdf triples in turtle exports (ttl files) and in Advene's non-proprietary azp-file format, which allows instant access through the graphical interface of the software.

    The annotation data can also be queried at our public SPARQL Endpoint: http://ada.filmontology.org/sparql

    • The dataset contains furthermore sample files for two different export capabilities of the web application AdA Annotation explorer: 1) all manual annotations of the type "field size" throughout the corpus as csv files. 2) static html exports of different queries conducted in the AdA Annotation Explorer.

    Manuals

    The data set includes different manuals and documentations in German and English: https://github.com/ProjectAdA/public/tree/master/manuals

    "AdA Filmontology – Levels, Types, Values": an overview over all analytical concepts and their definitions.

    "Manual: Annotating with Advene and the AdA Filmontology". A manual on the usage of Advene and the AdA Annotation Explorer that provides the basics for annotating audiovisual aesthetics and visualizing them.

    "Notes on collaborative annotation with the AdA Filmontology"

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eduardo Fonseca (2020). FSDKaggle2019 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3612636

FSDKaggle2019

Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Xavier Serra
Eduardo Fonseca
Frederic Font
Manoj Plakal
Daniel P. W. Ellis
Description

FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019.

Citation

If you use the FSDKaggle2019 dataset or part of it, please cite our DCASE 2019 paper:

Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra. "Audio tagging with noisy labels and minimal supervision". Proceedings of the DCASE 2019 Workshop, NYC, US (2019)

You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2019.

Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

Data curators

Eduardo Fonseca, Manoj Plakal, Xavier Favory, Jordi Pons

Contact

You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

ABOUT FSDKaggle2019

Freesound Dataset Kaggle 2019 (or FSDKaggle2019 for short) is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology [1]. FSDKaggle2019 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. Please visit the DCASE2019 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound Audio Tagging 2019. It was organized by researchers from the Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF), and from Sound Understanding team at Google AI Perception. The competition intended to provide insight towards the development of broadly-applicable sound event classifiers able to cope with label noise and minimal supervision conditions.

FSDKaggle2019 employs audio clips from the following sources:

Freesound Dataset (FSD): a dataset being collected at the MTG-UPF based on Freesound content organized with the AudioSet Ontology

The soundtracks of a pool of Flickr videos taken from the Yahoo Flickr Creative Commons 100M dataset (YFCC)

The audio data is labeled using a vocabulary of 80 labels from Google’s AudioSet Ontology [1], covering diverse topics: Guitar and other Musical Instruments, Percussion, Water, Digestive, Respiratory sounds, Human voice, Human locomotion, Hands, Human group actions, Insect, Domestic animals, Glass, Liquid, Motor vehicle (road), Mechanisms, Doors, and a variety of Domestic sounds. The full list of categories can be inspected in vocabulary.csv (see Files & Download below). The goal of the task was to build a multi-label audio tagging system that can predict appropriate label(s) for each audio clip in a test set.

What follows is a summary of some of the most relevant characteristics of FSDKaggle2019. Nevertheless, it is highly recommended to read our DCASE 2019 paper for a more in-depth description of the dataset and how it was built.

Ground Truth Labels

The ground truth labels are provided at the clip-level, and express the presence of a sound category in the audio clip, hence can be considered weak labels or tags. Audio clips have variable lengths (roughly from 0.3 to 30s).

The audio content from FSD has been manually labeled by humans following a data labeling process using the Freesound Annotator platform. Most labels have inter-annotator agreement but not all of them. More details about the data labeling process and the Freesound Annotator can be found in [2].

The YFCC soundtracks were labeled using automated heuristics applied to the audio content and metadata of the original Flickr clips. Hence, a substantial amount of label noise can be expected. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises. More information about some of the types of label noise that can be encountered is available in [3].

Specifically, FSDKaggle2019 features three types of label quality, one for each set in the dataset:

curated train set: correct (but potentially incomplete) labels

noisy train set: noisy labels

test set: correct and complete labels

Further details can be found below in the sections for each set.

Format

All audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.

DATA SPLIT

FSDKaggle2019 consists of two train sets and one test set. The idea is to limit the supervision provided for training (i.e., the manually-labeled, hence reliable, data), thus promoting approaches to deal with label noise.

Curated train set

The curated train set consists of manually-labeled data from FSD.

Number of clips/class: 75 except in a few cases (where there are less)

Total number of clips: 4970

Avg number of labels/clip: 1.2

Total duration: 10.5 hours

The duration of the audio clips ranges from 0.3 to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording/uploading sounds. Labels are correct but potentially incomplete. It can happen that a few of these audio clips present additional acoustic material beyond the provided ground truth label(s).

Noisy train set

The noisy train set is a larger set of noisy web audio data from Flickr videos taken from the YFCC dataset [5].

Number of clips/class: 300

Total number of clips: 19,815

Avg number of labels/clip: 1.2

Total duration: ~80 hours

The duration of the audio clips ranges from 1s to 15s, with the vast majority lasting 15s. Labels are automatically generated and purposefully noisy. No human validation is involved. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises.

Considering the numbers above, the per-class data distribution available for training is, for most of the classes, 300 clips from the noisy train set and 75 clips from the curated train set. This means 80% noisy / 20% curated at the clip level, while at the duration level the proportion is more extreme considering the variable-length clips.

Test set

The test set is used for system evaluation and consists of manually-labeled data from FSD.

Number of clips/class: between 50 and 150

Total number of clips: 4481

Avg number of labels/clip: 1.4

Total duration: 12.9 hours

The acoustic material present in the test set clips is labeled exhaustively using the aforementioned vocabulary of 80 classes. Most labels have inter-annotator agreement but not all of them. Except human error, the label(s) are correct and complete considering the target vocabulary; nonetheless, a few clips could still present additional (unlabeled) acoustic content out of the vocabulary.

During the DCASE2019 Challenge Task 2, the test set was split into two subsets, for the public and private leaderboards, and only the data corresponding to the public leaderboard was provided. In this current package you will find the full test set with all the test labels. To allow comparison with previous work, the file test_post_competition.csv includes a flag to determine the corresponding leaderboard (public or private) for each test clip (see more info in Files & Download below).

Acoustic mismatch

As mentioned before, FSDKaggle2019 uses audio clips from two sources:

FSD: curated train set and test set, and

YFCC: noisy train set.

While the sources of audio (Freesound and Flickr) are collaboratively contributed and pretty diverse themselves, a certain acoustic mismatch can be expected between FSD and YFCC. We conjecture this mismatch comes from a variety of reasons. For example, through acoustic inspection of a small sample of both data sources, we find a higher percentage of high quality recordings in FSD. In addition, audio clips in Freesound are typically recorded with the purpose of capturing audio, which is not necessarily the case in YFCC.

This mismatch can have an impact in the evaluation, considering that most of the train data come from YFCC, while all test data are drawn from FSD. This constraint (i.e., noisy training data coming from a different web audio source than the test set) is sometimes a real-world condition.

LICENSE

All clips in FSDKaggle2019 are released under Creative Commons (CC) licenses. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses.

Curated train set and test set. All clips in Freesound are released under different modalities of Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. The licenses are specified in the files train_curated_post_competition.csv and test_post_competition.csv. These licenses can be CC0, CC-BY, CC-BY-NC and CC Sampling+.

Noisy train set. Similarly, the licenses of the soundtracks from Flickr used in FSDKaggle2019 are specified in the file train_noisy_post_competition.csv. These licenses can be CC-BY and CC BY-SA.

In addition, FSDKaggle2019 as a whole is the result of a curation process and it has an additional license. FSDKaggle2019 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2019.doc zip file.

FILES & DOWNLOAD

FSDKaggle2019 can be downloaded as a series of zip files with the following directory structure:

root │
└───FSDKaggle2019.audio_train_curated/ Audio clips in the curated train set │ └───FSDKaggle2019.audio_train_noisy/ Audio clips in the noisy

Search
Clear search
Close search
Google apps
Main menu