This dataset was created by JingJIngHuHu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetically Spoken COCO
Version 1.0
This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This dataset was created as part the research reported in [5]. The speech was generated using gTTS [2]. The dataset consists of the following files:
[1] http://mscoco.org/dataset/#overview [2] https://pypi.python.org/pypi/gTTS [3] https://github.com/karpathy/neuraltalk [4] https://github.com/jameslyons/python_speech_features [5] https://arxiv.org/abs/1702.01991
SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.
Not seeing a result you expected?
Learn how you can add new datasets to our index.