4 datasets found

h
librispeech_asr
huggingface.co
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenSLR (2024). librispeech_asr [Dataset]. https://huggingface.co/datasets/openslr/librispeech_asr
Explore at:
Dataset updated
Jun 3, 2024
Dataset authored and provided by
OpenSLR
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for librispeech_asr

Dataset Summary

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

Supported Tasks and Leaderboards

automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.
h
librispeech_pc
huggingface.co
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoo-Min Jung (2024). librispeech_pc [Dataset]. https://huggingface.co/datasets/yoom618/librispeech_pc
Explore at:
Dataset updated
Nov 14, 2024
Authors
Yoo-Min Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Merge Librispeech audio files with punctuation and captalization restored transcripts from LibriSpeech-PC. I refered to the original LibriSpeech dataset module script from HuggingFace Datasets (https://huggingface.co/datasets/openslr/librispeech_asr). If you already have downloaded the LibriSpeech dataset via load_dataset('openslr/librispeech_asr'), the script will use the extracted audio files from the local directory and not download them twice. (only tested in my local environment though)
esb-datasets-test-only
huggingface.co
Updated Sep 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face for Audio (2023). esb-datasets-test-only [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only
Explore at:
Dataset updated
Sep 9, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face for Audio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

librispeech = load_dataset("esb/datasets", "librispeech", split="train")

"esb/datasets": the repository namespace. This is fixed for all ESB datasets.

"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.

split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.
h
esc-datasets
huggingface.co
Updated Oct 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ESC Benchmark (2022). esc-datasets [Dataset]. https://huggingface.co/datasets/esc-bench/esc-datasets
Explore at:
Dataset updated
Oct 1, 2022
Dataset authored and provided by
ESC Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All eight of datasets in ESC can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

librispeech = load_dataset("esc-benchmark/esc-datasets", "librispeech", split="train")

"esc-benchmark": the repository namespace. This is fixed for all ESC datasets.

"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESC to download that dataset.

split="train": the split. Set this… See the full description on the dataset page: https://huggingface.co/datasets/esc-bench/esc-datasets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenSLR (2024). librispeech_asr [Dataset]. https://huggingface.co/datasets/openslr/librispeech_asr

librispeech_asr

LibriSpeech

openslr/librispeech_asr

Explore at:

19 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 3, 2024

Dataset authored and provided by

OpenSLR

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for librispeech_asr

  Dataset Summary

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

  Supported Tasks and Leaderboards

automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.

Clear search

Close search

Google apps

Main menu

librispeech_asr

librispeech_pc

esb-datasets-test-only

esc-datasets

librispeech_asrSee More Versions

LibriSpeech

openslr/librispeech_asr

librispeech_asr