MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.
Libri-light is a large dataset of 60K hours of unlabelled speech from audiobooks in English. It is a benchmark for the training of automatic speech recognition (ASR) systems with limited or no supervision.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
collabora/librilight-webdataset dataset hosted on Hugging Face and contributed by the HF Datasets community
speed/libri-light dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Libriheavy-HQ
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context. Libriheavy is a labeled version of Libri-Light. Libriheavy-HQ replaces the default Libri-Light audio files with the highest quality available versions from librivox without re-encoding them. In most cases, this consists an upgrade of the source audio from a 64kbps .mp3 to a 128kbps .mp3.
Overview
This is the Libriheavy-HQ dataset, adapted for the datasets… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/Libriheavy-HQ.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Libriheavy is a labeled version of Librilight, read our paper for more details. See https://github.com/k2-fsa/libriheavy for more details.
Citation
@misc{kang2023libriheavy, title={Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context}, author={Wei Kang and Xiaoyu Yang and Zengwei Yao and Fangjun Kuang and Yifan Yang and Liyong Guo and Long Lin and Daniel Povey}… See the full description on the dataset page: https://huggingface.co/datasets/pkufool/libriheavy.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.