4 datasets found

t
Spoken-COCO - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Spoken-COCO - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/spoken-coco
Explore at:
Dataset updated
Dec 2, 2024
Description
Spoken-COCO is a large-scale dataset of audio and text pairs.
SpokenCOCO
kaggle.com
Updated Feb 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JingJIngHuHu (2023). SpokenCOCO [Dataset]. https://www.kaggle.com/datasets/jingjinghuhu/spokencoco/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JingJIngHuHu
Description
Dataset

This dataset was created by JingJIngHuHu

Contents
Z
Synthetically Spoken COCO
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grzegorz Chrupała (2020). Synthetically Spoken COCO [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_794832
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Grzegorz Chrupała
Afra Alishahi
Lieke Gelderloos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetically Spoken COCO

Version 1.0

This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This dataset was created as part the research reported in [5]. The speech was generated using gTTS [2]. The dataset consists of the following files:

dataset.json: Captions associated with MS COCO images. This information comes from [3].

sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files in the numpy array stored in dataset.mfcc.npy.

mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json and in sentid.txt.

dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from the audio. Each row corresponds to a caption. The order or the captions corresponds to the ordering in the file sentid.txt. MFCCs were extracted using [4].

[1] http://mscoco.org/dataset/#overview [2] https://pypi.python.org/pypi/gTTS [3] https://github.com/karpathy/neuraltalk [4] https://github.com/jameslyons/python_speech_features [5] https://arxiv.org/abs/1702.01991
P
SPEECH-COCO Dataset
paperswithcode.com
Updated Feb 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Havard; Laurent Besacier; Olivier Rosec (2021). SPEECH-COCO Dataset [Dataset]. https://paperswithcode.com/dataset/speech-coco
Explore at:
Dataset updated
Feb 22, 2021
Authors
William Havard; Laurent Besacier; Olivier Rosec
Description
SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.
Not seeing a result you expected?
Learn how you can add new datasets to our index.