22 datasets found
  1. Z

    FSDKaggle2018

    • data.niaid.nih.gov
    • opendatalab.com
    • +1more
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Serra (2020). FSDKaggle2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2552859
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Xavier Favory
    Jordi Pons
    Manoj Plakal
    Frederic Font
    Xavier Serra
    Daniel P. W. Ellis
    Eduardo Fonseca
    Description

    FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

    Citation

    If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    About this dataset

    Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

    The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

    All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

    The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

    "Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

    Some other relevant characteristics of FSDKaggle2018:

    The dataset is split into a train set and a test set.

    The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

    Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

    Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

    The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

    All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

    Data labeling process

    The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

    Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

    Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

    The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

    More details about the data labeling process can be found in [3].

    License

    FSDKaggle2018 has licenses at two different levels, as explained next.

    All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

    In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

    Files

    FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

    root │
    └───FSDKaggle2018.audio_train/ Audio clips in the train set │
    └───FSDKaggle2018.audio_test/ Audio clips in the test set │
    └───FSDKaggle2018.meta/ Files for evaluation setup │ │
    │ └───train_post_competition.csv Data split and ground truth for the train set │ │
    │ └───test_post_competition_scoring_clips.csv Ground truth for the test set

    └───FSDKaggle2018.doc/ │
    └───README.md The dataset description file you are reading │
    └───LICENSE-DATASET License of FSDKaggle2018 dataset as a whole

    NOTE: the original train.csv file provided during the competition has been updated with more metadata (licenses, Freesound ids, etc.) into train_post_competition.csv. Likewise, the original test.csv that was not public during the competition is now available with ground truth and metadata as test_post_competition_scoring_clips.csv. The file name test_post_competition_scoring_clips.csv refers to the fact that only the 1600 clips used for systems' ranking are included. During the competition, an additional subset of padding clips was added in order to prevent undesired practices. This padding subset (that was never used for systems' ranking) is no longer included in the dataset (see our DCASE 2018 paper for more details.)

    Each row (i.e. audio clip) of the train_post_competition.csv file contains the following information:

    fname: the file name

    label: the audio classification label (ground truth)

    manually_verified: Boolean (1 or 0) flag to indicate whether or not that annotation has been manually verified; see description above for more info

    freesound_id: the Freesound id for the audio clip

    license: the license for the audio clip

    Each row (i.e. audio clip) of the test_post_competition_scoring_clips.csv file contains the following information:

    fname: the file name

    label: the audio classification label (ground truth)

    usage: string that indicates to which Kaggle leaderboard the clip was associated during the competition: Public or Private

    freesound_id: the Freesound id for the audio clip

    license: the license for the audio clip

    Baseline System

    A CNN baseline system for FSDKaggle2018 is available at

  2. clip-image-embedding-sgd-epoch3-batch3800

    • kaggle.com
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qiexi fan (2023). clip-image-embedding-sgd-epoch3-batch3800 [Dataset]. https://www.kaggle.com/datasets/qiexifan/clip-image-embedding-sgd-epoch3-batch3800/code
    Explore at:
    zip(3754291012 bytes)Available download formats
    Dataset updated
    Apr 20, 2023
    Authors
    qiexi fan
    Description

    Dataset

    This dataset was created by qiexi fan

    Contents

  3. Z

    FSDKaggle2019

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca (2020). FSDKaggle2019 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3612636
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Manoj Plakal
    Frederic Font
    Xavier Serra
    Daniel P. W. Ellis
    Eduardo Fonseca
    Description

    FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019.

    Citation

    If you use the FSDKaggle2019 dataset or part of it, please cite our DCASE 2019 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra. "Audio tagging with noisy labels and minimal supervision". Proceedings of the DCASE 2019 Workshop, NYC, US (2019)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2019.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Data curators

    Eduardo Fonseca, Manoj Plakal, Xavier Favory, Jordi Pons

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    ABOUT FSDKaggle2019

    Freesound Dataset Kaggle 2019 (or FSDKaggle2019 for short) is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology [1]. FSDKaggle2019 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. Please visit the DCASE2019 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound Audio Tagging 2019. It was organized by researchers from the Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF), and from Sound Understanding team at Google AI Perception. The competition intended to provide insight towards the development of broadly-applicable sound event classifiers able to cope with label noise and minimal supervision conditions.

    FSDKaggle2019 employs audio clips from the following sources:

    Freesound Dataset (FSD): a dataset being collected at the MTG-UPF based on Freesound content organized with the AudioSet Ontology

    The soundtracks of a pool of Flickr videos taken from the Yahoo Flickr Creative Commons 100M dataset (YFCC)

    The audio data is labeled using a vocabulary of 80 labels from Google’s AudioSet Ontology [1], covering diverse topics: Guitar and other Musical Instruments, Percussion, Water, Digestive, Respiratory sounds, Human voice, Human locomotion, Hands, Human group actions, Insect, Domestic animals, Glass, Liquid, Motor vehicle (road), Mechanisms, Doors, and a variety of Domestic sounds. The full list of categories can be inspected in vocabulary.csv (see Files & Download below). The goal of the task was to build a multi-label audio tagging system that can predict appropriate label(s) for each audio clip in a test set.

    What follows is a summary of some of the most relevant characteristics of FSDKaggle2019. Nevertheless, it is highly recommended to read our DCASE 2019 paper for a more in-depth description of the dataset and how it was built.

    Ground Truth Labels

    The ground truth labels are provided at the clip-level, and express the presence of a sound category in the audio clip, hence can be considered weak labels or tags. Audio clips have variable lengths (roughly from 0.3 to 30s).

    The audio content from FSD has been manually labeled by humans following a data labeling process using the Freesound Annotator platform. Most labels have inter-annotator agreement but not all of them. More details about the data labeling process and the Freesound Annotator can be found in [2].

    The YFCC soundtracks were labeled using automated heuristics applied to the audio content and metadata of the original Flickr clips. Hence, a substantial amount of label noise can be expected. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises. More information about some of the types of label noise that can be encountered is available in [3].

    Specifically, FSDKaggle2019 features three types of label quality, one for each set in the dataset:

    curated train set: correct (but potentially incomplete) labels

    noisy train set: noisy labels

    test set: correct and complete labels

    Further details can be found below in the sections for each set.

    Format

    All audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.

    DATA SPLIT

    FSDKaggle2019 consists of two train sets and one test set. The idea is to limit the supervision provided for training (i.e., the manually-labeled, hence reliable, data), thus promoting approaches to deal with label noise.

    Curated train set

    The curated train set consists of manually-labeled data from FSD.

    Number of clips/class: 75 except in a few cases (where there are less)

    Total number of clips: 4970

    Avg number of labels/clip: 1.2

    Total duration: 10.5 hours

    The duration of the audio clips ranges from 0.3 to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording/uploading sounds. Labels are correct but potentially incomplete. It can happen that a few of these audio clips present additional acoustic material beyond the provided ground truth label(s).

    Noisy train set

    The noisy train set is a larger set of noisy web audio data from Flickr videos taken from the YFCC dataset [5].

    Number of clips/class: 300

    Total number of clips: 19,815

    Avg number of labels/clip: 1.2

    Total duration: ~80 hours

    The duration of the audio clips ranges from 1s to 15s, with the vast majority lasting 15s. Labels are automatically generated and purposefully noisy. No human validation is involved. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises.

    Considering the numbers above, the per-class data distribution available for training is, for most of the classes, 300 clips from the noisy train set and 75 clips from the curated train set. This means 80% noisy / 20% curated at the clip level, while at the duration level the proportion is more extreme considering the variable-length clips.

    Test set

    The test set is used for system evaluation and consists of manually-labeled data from FSD.

    Number of clips/class: between 50 and 150

    Total number of clips: 4481

    Avg number of labels/clip: 1.4

    Total duration: 12.9 hours

    The acoustic material present in the test set clips is labeled exhaustively using the aforementioned vocabulary of 80 classes. Most labels have inter-annotator agreement but not all of them. Except human error, the label(s) are correct and complete considering the target vocabulary; nonetheless, a few clips could still present additional (unlabeled) acoustic content out of the vocabulary.

    During the DCASE2019 Challenge Task 2, the test set was split into two subsets, for the public and private leaderboards, and only the data corresponding to the public leaderboard was provided. In this current package you will find the full test set with all the test labels. To allow comparison with previous work, the file test_post_competition.csv includes a flag to determine the corresponding leaderboard (public or private) for each test clip (see more info in Files & Download below).

    Acoustic mismatch

    As mentioned before, FSDKaggle2019 uses audio clips from two sources:

    FSD: curated train set and test set, and

    YFCC: noisy train set.

    While the sources of audio (Freesound and Flickr) are collaboratively contributed and pretty diverse themselves, a certain acoustic mismatch can be expected between FSD and YFCC. We conjecture this mismatch comes from a variety of reasons. For example, through acoustic inspection of a small sample of both data sources, we find a higher percentage of high quality recordings in FSD. In addition, audio clips in Freesound are typically recorded with the purpose of capturing audio, which is not necessarily the case in YFCC.

    This mismatch can have an impact in the evaluation, considering that most of the train data come from YFCC, while all test data are drawn from FSD. This constraint (i.e., noisy training data coming from a different web audio source than the test set) is sometimes a real-world condition.

    LICENSE

    All clips in FSDKaggle2019 are released under Creative Commons (CC) licenses. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses.

    Curated train set and test set. All clips in Freesound are released under different modalities of Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. The licenses are specified in the files train_curated_post_competition.csv and test_post_competition.csv. These licenses can be CC0, CC-BY, CC-BY-NC and CC Sampling+.

    Noisy train set. Similarly, the licenses of the soundtracks from Flickr used in FSDKaggle2019 are specified in the file train_noisy_post_competition.csv. These licenses can be CC-BY and CC BY-SA.

    In addition, FSDKaggle2019 as a whole is the result of a curation process and it has an additional license. FSDKaggle2019 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2019.doc zip file.

    FILES & DOWNLOAD

    FSDKaggle2019 can be downloaded as a series of zip files with the following directory structure:

    root │
    └───FSDKaggle2019.audio_train_curated/ Audio clips in the curated train set │ └───FSDKaggle2019.audio_train_noisy/ Audio clips in the noisy

  4. clip-features

    • kaggle.com
    zip
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lannguyen (2022). clip-features [Dataset]. https://www.kaggle.com/datasets/teoem2k/clipfeatures
    Explore at:
    zip(127652827 bytes)Available download formats
    Dataset updated
    Oct 4, 2022
    Authors
    lannguyen
    Description

    Dataset

    This dataset was created by lannguyen

    Contents

  5. SDIP CLIP TFRecords dataset 4

    • kaggle.com
    zip
    Updated May 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    junseonglee11 (2023). SDIP CLIP TFRecords dataset 4 [Dataset]. https://www.kaggle.com/datasets/junseonglee11/sdip-clip-tfrecords-dataset-4
    Explore at:
    zip(15713161091 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    junseonglee11
    Description

    Dataset

    This dataset was created by junseonglee11

    Contents

  6. sdip-tfrecord-clip-sd2gpt2

    • kaggle.com
    zip
    Updated May 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-tfrecord-clip-sd2gpt2 [Dataset]. https://www.kaggle.com/datasets/motono0223/sdip-tfrecord-clip-sd2gpt2/code
    Explore at:
    zip(115307953 bytes)Available download formats
    Dataset updated
    May 1, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  7. sdip-tfrecord-clip-laion2b-part0300-0399

    • kaggle.com
    Updated May 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-tfrecord-clip-laion2b-part0300-0399 [Dataset]. https://www.kaggle.com/datasets/motono0223/sdip-tfrecord-clip-laion2b-part0300-0399/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  8. sdip-tfrecord-clip-diffdb2m-th095

    • kaggle.com
    zip
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-tfrecord-clip-diffdb2m-th095 [Dataset]. https://www.kaggle.com/motono0223/sdip-tfrecord-clip-diffdb2m-th095
    Explore at:
    zip(2303184923 bytes)Available download formats
    Dataset updated
    May 1, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  9. two-clip-88

    • kaggle.com
    zip
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RocketScience (2022). two-clip-88 [Dataset]. https://www.kaggle.com/datasets/ystrangex/twoclip88/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(1456842721 bytes)Available download formats
    Dataset updated
    Sep 23, 2022
    Authors
    RocketScience
    Description

    Dataset

    This dataset was created by RocketScience

    Contents

  10. CLIP-PACKAGE-WEIGHT

    • kaggle.com
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ForcewithMe (2023). CLIP-PACKAGE-WEIGHT [Dataset]. https://www.kaggle.com/datasets/forcewithme/clip-package-weight/suggestions
    Explore at:
    zip(216539900 bytes)Available download formats
    Dataset updated
    Jun 4, 2023
    Authors
    ForcewithMe
    Description

    Dataset

    This dataset was created by ForcewithMe

    Contents

  11. clip-vit-large-patch14

    • kaggle.com
    zip
    Updated Apr 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitriy Gerasimov (2023). clip-vit-large-patch14 [Dataset]. https://www.kaggle.com/datasets/dmitriygerasimov/clip-vit-large-patch14/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(1088659557 bytes)Available download formats
    Dataset updated
    Apr 10, 2023
    Authors
    Dmitriy Gerasimov
    Description

    Dataset

    This dataset was created by Dmitriy Gerasimov

    Contents

  12. clip-image-embedding-sgd-epoch0-batch88032

    • kaggle.com
    zip
    Updated Apr 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qiexi fan (2023). clip-image-embedding-sgd-epoch0-batch88032 [Dataset]. https://www.kaggle.com/datasets/qiexifan/clip-image-embedding-sgd-epoch0-batch88032
    Explore at:
    zip(2343328806 bytes)Available download formats
    Dataset updated
    Apr 18, 2023
    Authors
    qiexi fan
    Description

    Dataset

    This dataset was created by qiexi fan

    Contents

  13. clip-image-embedding-third-epoch

    • kaggle.com
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qiexi fan (2023). clip-image-embedding-third-epoch [Dataset]. https://www.kaggle.com/datasets/qiexifan/clip-image-embedding-third-epoch/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    qiexi fan
    Description

    Dataset

    This dataset was created by qiexi fan

    Contents

  14. CLIP Pretrained

    • kaggle.com
    zip
    Updated Mar 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Btbpanda (2021). CLIP Pretrained [Dataset]. https://www.kaggle.com/datasets/btbpanda/clip-pretrained
    Explore at:
    zip(2341764459 bytes)Available download formats
    Dataset updated
    Mar 30, 2021
    Authors
    Btbpanda
    Description

    Dataset

    This dataset was created by Btbpanda

    Contents

  15. clip-image-coyo-1k

    • kaggle.com
    zip
    Updated Jun 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anant jain1223 (2024). clip-image-coyo-1k [Dataset]. https://www.kaggle.com/datasets/anantjain1223/clip-image-coyo-1k
    Explore at:
    zip(2963794 bytes)Available download formats
    Dataset updated
    Jun 26, 2024
    Authors
    anant jain1223
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by anant jain1223

    Released under MIT

    Contents

  16. d_electra-l_clip_x2_wiki103_mlm_whole_lr5e-5

    • kaggle.com
    zip
    Updated Feb 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Splend1dChan(燦爛) (2022). d_electra-l_clip_x2_wiki103_mlm_whole_lr5e-5 [Dataset]. https://www.kaggle.com/datasets/a24998667/d-electra-l-clip-x2-wiki103-mlm-whole-lr5e-5
    Explore at:
    zip(24888040219 bytes)Available download formats
    Dataset updated
    Feb 25, 2022
    Authors
    Splend1dChan(燦爛)
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Splend1dChan(燦爛)

    Released under CC0: Public Domain

    Contents

  17. sdip-clip-laion2b-lowsim-chunk00

    • kaggle.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-clip-laion2b-lowsim-chunk00 [Dataset]. https://www.kaggle.com/datasets/motono0223/sdip-clip-laion2b-lowsim-chunk00
    Explore at:
    zip(250 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  18. sdip-clip-laion2b-lowsim-chunk04

    • kaggle.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-clip-laion2b-lowsim-chunk04 [Dataset]. https://www.kaggle.com/datasets/motono0223/sdip-clip-laion2b-lowsim-chunk04
    Explore at:
    zip(15778790168 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  19. sdip-clip-laion2b-highsim-chunk13

    • kaggle.com
    zip
    Updated May 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-clip-laion2b-highsim-chunk13 [Dataset]. https://www.kaggle.com/motono0223/sdip-clip-laion2b-highsim-chunk13
    Explore at:
    zip(15403506475 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

  20. sdip-clip-laion2b-highsim-chunk12

    • kaggle.com
    zip
    Updated May 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    motono0223 (2023). sdip-clip-laion2b-highsim-chunk12 [Dataset]. https://www.kaggle.com/datasets/motono0223/sdip-clip-laion2b-highsim-chunk12
    Explore at:
    zip(15390945451 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    motono0223
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by motono0223

    Released under CC0: Public Domain

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Xavier Serra (2020). FSDKaggle2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2552859

FSDKaggle2018

Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Xavier Favory
Jordi Pons
Manoj Plakal
Frederic Font
Xavier Serra
Daniel P. W. Ellis
Eduardo Fonseca
Description

FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

Citation

If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

Contact

You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

About this dataset

Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

"Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

Some other relevant characteristics of FSDKaggle2018:

The dataset is split into a train set and a test set.

The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

Data labeling process

The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

More details about the data labeling process can be found in [3].

License

FSDKaggle2018 has licenses at two different levels, as explained next.

All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

Files

FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

root │
└───FSDKaggle2018.audio_train/ Audio clips in the train set │
└───FSDKaggle2018.audio_test/ Audio clips in the test set │
└───FSDKaggle2018.meta/ Files for evaluation setup │ │
│ └───train_post_competition.csv Data split and ground truth for the train set │ │
│ └───test_post_competition_scoring_clips.csv Ground truth for the test set

└───FSDKaggle2018.doc/ │
└───README.md The dataset description file you are reading │
└───LICENSE-DATASET License of FSDKaggle2018 dataset as a whole

NOTE: the original train.csv file provided during the competition has been updated with more metadata (licenses, Freesound ids, etc.) into train_post_competition.csv. Likewise, the original test.csv that was not public during the competition is now available with ground truth and metadata as test_post_competition_scoring_clips.csv. The file name test_post_competition_scoring_clips.csv refers to the fact that only the 1600 clips used for systems' ranking are included. During the competition, an additional subset of padding clips was added in order to prevent undesired practices. This padding subset (that was never used for systems' ranking) is no longer included in the dataset (see our DCASE 2018 paper for more details.)

Each row (i.e. audio clip) of the train_post_competition.csv file contains the following information:

fname: the file name

label: the audio classification label (ground truth)

manually_verified: Boolean (1 or 0) flag to indicate whether or not that annotation has been manually verified; see description above for more info

freesound_id: the Freesound id for the audio clip

license: the license for the audio clip

Each row (i.e. audio clip) of the test_post_competition_scoring_clips.csv file contains the following information:

fname: the file name

label: the audio classification label (ground truth)

usage: string that indicates to which Kaggle leaderboard the clip was associated during the competition: Public or Private

freesound_id: the Freesound id for the audio clip

license: the license for the audio clip

Baseline System

A CNN baseline system for FSDKaggle2018 is available at

Search
Clear search
Close search
Google apps
Main menu