4 datasets found
  1. t

    Spoken-COCO - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Spoken-COCO - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/spoken-coco
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    Spoken-COCO is a large-scale dataset of audio and text pairs.

  2. SpokenCOCO

    • kaggle.com
    Updated Feb 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JingJIngHuHu (2023). SpokenCOCO [Dataset]. https://www.kaggle.com/datasets/jingjinghuhu/spokencoco/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JingJIngHuHu
    Description

    Dataset

    This dataset was created by JingJIngHuHu

    Contents

  3. Z

    Synthetically Spoken COCO

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grzegorz Chrupała (2020). Synthetically Spoken COCO [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_794832
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Grzegorz Chrupała
    Afra Alishahi
    Lieke Gelderloos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetically Spoken COCO

    Version 1.0

    This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This dataset was created as part the research reported in [5]. The speech was generated using gTTS [2]. The dataset consists of the following files:

    • dataset.json: Captions associated with MS COCO images. This information comes from [3].
    • sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files in the numpy array stored in dataset.mfcc.npy.
    • mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json and in sentid.txt.
    • dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from the audio. Each row corresponds to a caption. The order or the captions corresponds to the ordering in the file sentid.txt. MFCCs were extracted using [4].

    [1] http://mscoco.org/dataset/#overview [2] https://pypi.python.org/pypi/gTTS [3] https://github.com/karpathy/neuraltalk [4] https://github.com/jameslyons/python_speech_features [5] https://arxiv.org/abs/1702.01991

  4. P

    SPEECH-COCO Dataset

    • paperswithcode.com
    Updated Feb 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Havard; Laurent Besacier; Olivier Rosec (2021). SPEECH-COCO Dataset [Dataset]. https://paperswithcode.com/dataset/speech-coco
    Explore at:
    Dataset updated
    Feb 22, 2021
    Authors
    William Havard; Laurent Besacier; Olivier Rosec
    Description

    SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Spoken-COCO - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/spoken-coco

Spoken-COCO - Dataset - LDM

Explore at:
Dataset updated
Dec 2, 2024
Description

Spoken-COCO is a large-scale dataset of audio and text pairs.

Search
Clear search
Close search
Google apps
Main menu