6 datasets found
  1. O

    Libri-Light

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facebook AI Research (2023). Libri-Light [Dataset]. https://opendatalab.com/OpenDataLab/Libri-Light
    Explore at:
    zip(35013347043 bytes)Available download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Facebook AI Research
    PSL Research University
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.

  2. h

    test_libri_light

    • huggingface.co
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farzad (2024). test_libri_light [Dataset]. https://huggingface.co/datasets/farzadab/test_libri_light
    Explore at:
    Dataset updated
    Oct 25, 2024
    Authors
    Farzad
    Description

    Libri-light is a large dataset of 60K hours of unlabelled speech from audiobooks in English. It is a benchmark for the training of automatic speech recognition (ASR) systems with limited or no supervision.

  3. librilight-webdataset

    • huggingface.co
    Updated Sep 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collabora (2024). librilight-webdataset [Dataset]. https://huggingface.co/datasets/collabora/librilight-webdataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2024
    Dataset authored and provided by
    Collaborahttp://collabora.com/
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    collabora/librilight-webdataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    libri-light

    • huggingface.co
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    speed (2025). libri-light [Dataset]. https://huggingface.co/datasets/speed/libri-light
    Explore at:
    Dataset updated
    Jul 11, 2025
    Authors
    speed
    Description

    speed/libri-light dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    Libriheavy-HQ

    • huggingface.co
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mythic Infinity (2024). Libriheavy-HQ [Dataset]. https://huggingface.co/datasets/mythicinfinity/Libriheavy-HQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Dataset authored and provided by
    Mythic Infinity
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Libriheavy-HQ

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context. Libriheavy is a labeled version of Libri-Light. Libriheavy-HQ replaces the default Libri-Light audio files with the highest quality available versions from librivox without re-encoding them. In most cases, this consists an upgrade of the source audio from a 64kbps .mp3 to a 128kbps .mp3.

      Overview
    

    This is the Libriheavy-HQ dataset, adapted for the datasets… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/Libriheavy-HQ.

  6. h

    libriheavy

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pkufool (2025). libriheavy [Dataset]. https://huggingface.co/datasets/pkufool/libriheavy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2025
    Authors
    pkufool
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

    Libriheavy is a labeled version of Librilight, read our paper for more details. See https://github.com/k2-fsa/libriheavy for more details.

      Citation
    

    @misc{kang2023libriheavy, title={Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context}, author={Wei Kang and Xiaoyu Yang and Zengwei Yao and Fangjun Kuang and Yifan Yang and Liyong Guo and Long Lin and Daniel Povey}… See the full description on the dataset page: https://huggingface.co/datasets/pkufool/libriheavy.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Facebook AI Research (2023). Libri-Light [Dataset]. https://opendatalab.com/OpenDataLab/Libri-Light

Libri-Light

OpenDataLab/Libri-Light

Explore at:
zip(35013347043 bytes)Available download formats
Dataset updated
Mar 24, 2023
Dataset provided by
Facebook AI Research
PSL Research University
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Libri-Light is a collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio.

Search
Clear search
Close search
Google apps
Main menu