6 datasets found

O
Libri-Light
opendatalab.com
paperswithcode.com
zip
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facebook AI Research (2023). Libri-Light [Dataset]. https://opendatalab.com/OpenDataLab/Libri-Light
Explore at:
zip(35013347043 bytes)Available download formats
Dataset updated
Mar 24, 2023
Dataset provided by
Facebook AI Research
PSL Research University
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.
h
test_libri_light
huggingface.co
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farzad (2024). test_libri_light [Dataset]. https://huggingface.co/datasets/farzadab/test_libri_light
Explore at:
Dataset updated
Oct 25, 2024
Authors
Farzad
Description
Libri-light is a large dataset of 60K hours of unlabelled speech from audiobooks in English. It is a benchmark for the training of automatic speech recognition (ASR) systems with limited or no supervision.
librilight-webdataset
huggingface.co
Updated Sep 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collabora (2024). librilight-webdataset [Dataset]. https://huggingface.co/datasets/collabora/librilight-webdataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2024
Dataset authored and provided by
Collaborahttp://collabora.com/
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
collabora/librilight-webdataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
libri-light
huggingface.co
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
speed (2025). libri-light [Dataset]. https://huggingface.co/datasets/speed/libri-light
Explore at:
Dataset updated
Jul 11, 2025
Authors
speed
Description
speed/libri-light dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Libriheavy-HQ
huggingface.co
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mythic Infinity (2024). Libriheavy-HQ [Dataset]. https://huggingface.co/datasets/mythicinfinity/Libriheavy-HQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2024
Dataset authored and provided by
Mythic Infinity
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Libriheavy-HQ

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context. Libriheavy is a labeled version of Libri-Light. Libriheavy-HQ replaces the default Libri-Light audio files with the highest quality available versions from librivox without re-encoding them. In most cases, this consists an upgrade of the source audio from a 64kbps .mp3 to a 128kbps .mp3.

Overview

This is the Libriheavy-HQ dataset, adapted for the datasets… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/Libriheavy-HQ.
h
libriheavy
huggingface.co
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pkufool (2025). libriheavy [Dataset]. https://huggingface.co/datasets/pkufool/libriheavy
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2025
Authors
pkufool
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Libriheavy is a labeled version of Librilight, read our paper for more details. See https://github.com/k2-fsa/libriheavy for more details.

Citation

@misc{kang2023libriheavy, title={Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context}, author={Wei Kang and Xiaoyu Yang and Zengwei Yao and Fangjun Kuang and Yifan Yang and Liyong Guo and Long Lin and Daniel Povey}… See the full description on the dataset page: https://huggingface.co/datasets/pkufool/libriheavy.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Facebook AI Research (2023). Libri-Light [Dataset]. https://opendatalab.com/OpenDataLab/Libri-Light

Libri-Light

OpenDataLab/Libri-Light

Explore at:

zip(35013347043 bytes)Available download formats

Dataset updated

Mar 24, 2023

Dataset provided by

Facebook AI Research
PSL Research University

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.

Clear search

Close search

Google apps

Main menu

Libri-Light

test_libri_light

librilight-webdataset

libri-light

Libriheavy-HQ

libriheavy

Libri-Light

OpenDataLab/Libri-Light