12 datasets found

h
librispeech_asr
huggingface.co
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenSLR (2024). librispeech_asr [Dataset]. https://huggingface.co/datasets/openslr/librispeech_asr
Explore at:
Dataset updated
Jun 3, 2024
Dataset authored and provided by
OpenSLR
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for librispeech_asr

Dataset Summary

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

Supported Tasks and Leaderboards

automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.
T
librispeech
tensorflow.org
opendatalab.com
+2more
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). librispeech [Dataset]. https://www.tensorflow.org/datasets/catalog/librispeech
Explore at:
Dataset updated
Dec 11, 2024
Description
LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

It's recommended to use lazy audio decoding for faster reading and smaller dataset size: - install tensorflow_io library: pip install tensorflow-io - enable lazy decoding: tfds.load('librispeech', builder_kwargs={'config': 'lazy_decode'})

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('librispeech', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
librispeech-spkid-corpus
kaggle.com
Updated Nov 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommy NgX (2020). librispeech-spkid-corpus [Dataset]. https://www.kaggle.com/tommyngx/librispeech-spkid-corpus/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tommy NgX
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Tommy NgX

Released under CC0: Public Domain

Contents

LibriSpeech ASR corpus
h
libris_clean_100
huggingface.co
Updated Apr 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Binh Nguyen (2023). libris_clean_100 [Dataset]. https://huggingface.co/datasets/nguyenvulebinh/libris_clean_100
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2023
Authors
Binh Nguyen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for librispeech_asr

Dataset Summary

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

Supported Tasks and Leaderboards

automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/nguyenvulebinh/libris_clean_100.
h
reborn-uasr_multilingual-librispeech-no-silence-100hr
huggingface.co
Updated Mar 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang-Hsuan Tseng (2024). reborn-uasr_multilingual-librispeech-no-silence-100hr [Dataset]. https://huggingface.co/datasets/andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr
Explore at:
Dataset updated
Mar 20, 2024
Authors
Liang-Hsuan Tseng
Description
LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned

This dataset is the 100-hour subset of LibriSpeech 'train-clean-100' split, with silence removed. Additionally, all the dev and test sets are included for fair comparison and evaluation if needed. The dataset is prepared by the Reborn UASR team. Arxiv paper link: https://arxiv.org/abs/2402.03988
librispeech-spkid-corpus
kaggle.com
zip
Updated Nov 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommy NgX (2020). librispeech-spkid-corpus [Dataset]. https://www.kaggle.com/tommyngx/librispeech-spkid-corpus
Explore at:
zip(2377735057 bytes)Available download formats
Dataset updated
Nov 24, 2020
Authors
Tommy NgX
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
LibriSpeech ASR corpus
Translation Augmented LibriSpeech Corpus
zenodo.org
zip
Updated Jul 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Can Kocabiyikoglu; Alexandre Bérard; Laurent Besacier; Olivier Kraif; Ali Can Kocabiyikoglu; Alexandre Bérard; Laurent Besacier; Olivier Kraif (2022). Translation Augmented LibriSpeech Corpus [Dataset]. http://doi.org/10.5281/zenodo.6482585
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6482585
Dataset updated
Jul 10, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ali Can Kocabiyikoglu; Alexandre Bérard; Laurent Besacier; Olivier Kraif; Ali Can Kocabiyikoglu; Alexandre Bérard; Laurent Besacier; Olivier Kraif
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text. Speech recordings and source texts are originally from Gutenberg Project, which is a digital library of public domain books read by volunteers. Our augmentation of LibriSpeech is straightforward: we automatically aligned e-books in a foreign language (French) with English utterances of LibriSpeech. We gathered open domain e-books in French and extracted individual chapters available in LibriSpeech Corpus. Furthermore, we aligned chapters in French with English utterances in order to provide a corpus of speech recordings aligned with their translations.

====================================================

Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h)[1] and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text.

Overview of the corpus:
+----------+-------+--------------+----------------+
| Chapters | Books | Duration (h) | Total Segments |
+----------+-------+--------------+----------------+
| 1408 | 247 | ~236h | 131395 |
+----------+-------+--------------+----------------+

Speech recordings and source texts are originally from Gutenberg Project[2], which is a digital library of public domain books read by volunteers. Our augmentation of LibriSpeech is straightforward: we automatically aligned e-books in a foreign language (French) with English utterances of LibriSpeech.

We gathered open domain e-books in French and extracted individual chapters available in LibriSpeech Corpus. Furthermore, we aligned chapters in French with English utterances in order to provide a corpus of speech recordings aligned with their translations. Our corpus is licensed under a Creative Commons Attribution 4.0 License.

Further information on how the corpus was obtained can be found in [3].

Details on the 100h subset:
===========================

This 100h subset was specifically designed for direct speech translation training and evaluation.
It was used for the first time in [4] (end-to-end automatic speech recognition of audiobooks).
In this subset, we extracted the best 100h according to cross language alignment scores. Dev and Test sets are composed of clean speech segments only.
Since English (source) transcriptions are initially available for LibriSpeech, we also translated them using Google Translate. To summarize, for each utterance of our corpus, the following quadruplet is available: English speech signal, English transcription (should not be used for direct speech translation experiments), French text translation 1 (from alignment of e-books) and translation 2 (from MT of English transcripts).

+---------+----------+--------+-----------------------------+-----------------+
| Corpus | Total | | Source(per seg) | Target(per seg) |
+---------+----------+--------+-----------------------------+-----------------+
| | segments | hours | frames | chars | (sub)words | chars |
+---------+----------+--------+--------+-------+------------+-----------------+
| train 1 | 47271 | 100:00 | 762 | 111 | 20.7 | 143 |
| train 2 | | | | | | 126 |
+---------+----------+--------+--------+-------+------------+-----------------+
| dev | 1071 | 2:00 | 673 | 93 | 17.9 | 110 |
+---------+----------+--------+--------+-------+------------+-----------------+
| test | 2048 | 3:44 | 657 | 95 | 18.3 | 112 |
+---------+----------+--------+--------+-------+------------+-----------------+

The following archives correspond to the 100h subset used in [4]:

For audio files:

- train_100h.zip (~8.7GB)
- dev.zip(~180MB)
- test.zip(~330MB)
- train_130h_additional.zip (~10.6GB)

For aligned text files:

- train_100h_txt.zip
- dev_txt.zip
- test_txt.zip
- train130h_additional_txt.zip

Other archives provided:
========================

Following archives are available to download for other potential use of the corpus:

- database.zip(~50MB): Database describing the corpus (sqlite3)
- alignments.zip(~1.86GB): All of the intermediate processing files created in the cross-lingual alignment process along with English and French raw e-books
- audio_files.zip(~23GB): All of the speech segments organized as books and chapters
- interface.zip(~72MB): Contains static html files for alignment visualisation. With the interface, speech utterances can be listened while visualizing each sentence alignment

Note: In order to listen to speech segments with the html interface, 'audio_files' folder should be placed inside the 'Interface' folder
./Interface
./audio_files (audio_files.zip)
./css (interface.zip)
./js (interface.zip)
(..)

Github Page
===========
We provide a python script to interact with the database and to extract the corpus with different queries. This script along with all of the code used for the alignment process can be found at:
https://github.com/alicank/Translation-Augmented-LibriSpeech-Corpus

Detailed Corpus Structure
=========================

Folders name convention corresponds to book id's from LibriSpeech and Gutenberg projects. For instance folder name "11" corresponds to the id number of "Alice's Adventures in Wonderland by Lewis Carroll" in both Gutenberg Project and LibriSpeech Project.

This corpus is composed of three sections:
- Audio Files: resegmented audio files for each book id in the project
- HTML alignment visualisation interface : HTML visualisation for textual alignments with audio files avaliable to listen
- Alignments folder: all of the processing steps: pre-processing, alignment, forced transcriptions, forced alignments, etc.

-Interface
- audio_files/ : folder contains ~130.000 audio segments aligned with their translations
- book id/
- Chapter id/
- book_id-chapter_id-sentence_number.wav
- reader_id-chapter_id-sentence_number.wav **if the corpus comes from the dev/test pool of LibriSpeech**

- Alignments/ : Folder contains processing steps used in different alignment stages (reading [3] is mandatory to understand where these files come from)

- en/ : Folder contains preprocessing steps for English chapters used before alignment

- fr/ Folder contains preprocessing steps for French chapters used before alignment

- ls_book_id.txt (Gutenberg original text)
- lc_book_id.format (pdf,epub,txt,...)

- db/ Folder contains the database containing alignments, metadata and other information
-TA-LibriSpeechCorpus.sqlite3

index.html (Main html page of the Interface)

Database Structure
==================

Corpus is provided with different tables containing useful information provided with the corpus. Database structure is organized as follows:

Alignment Tables
- alignments: Table containing transcriptions, textual alignments and name of the audio file associated with a given alignment. Each row corresponds to an aligned sentence.
- audio: Table that contains duration of each speech segment (seconds)
- alignments_evaluations: 200 sentences manually annotated (for alignement evaluation see [3])
- alignments_excluded: Table used to mark sentences to be excluded from the corpus (bad alignments)
- alignments_gTranslate: automatic translation output from Google translate for each segment (transcriptions)
- alignments_scores: different cross lingual alignment score calculations provided with the corpus which could be used to sort the corpus from highest scores to the lowest

Metadata Tables
- Table librispeech: This table contains all the books from LibriSpeech project for which a downloadable link could be found (might be a dead/wrong link if it disappeared after our work)
- Table csv,clean100,other: Metadata completion for books provided with LibriSpeech project.
- Table nosLivres: some French e-book links gathered from http://www.nosLivres.net

References
==========

[1] Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015, April). Librispeech: an ASR corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 5206-5210). IEEE.
[2] https://www.gutenberg.org/
[3] Ali Can Kocabiyikoglu, Laurent Besacier and Olivier Kraif, "Augmenting LibriSpeech with French Translations : A Multimodal Corpus for Direct Speech Translation Evaluation" in submitted to LREC, 2018.
[4] Aléxandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu and Olivier Pietquin, "End-to-End Automatic Speech Translation of Audiobooks" in submitted to ICASSP, 2018.
h
librispeech-alignments
huggingface.co
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim Gilkey (2024). librispeech-alignments [Dataset]. https://huggingface.co/datasets/gilkeyio/librispeech-alignments
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2024
Authors
Kim Gilkey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Librispeech Alignments

Librispeech with alignments generated by the Montreal Forced Aligner. The original alignments in TextGrid format can be found here

Dataset Details Dataset Description

Librispeech is a corpus of read English speech, designed for training and evaluating automatic speech recognition (ASR) systems. The dataset contains 1000 hours of 16kHz read English speech derived from audiobooks. The Montreal Forced Aligner (MFA) was used… See the full description on the dataset page: https://huggingface.co/datasets/gilkeyio/librispeech-alignments.
Z
Crowdsourced LibriTTS Speech Prominence Annotations
data.niaid.nih.gov
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morrison, Max (2023). Crowdsourced LibriTTS Speech Prominence Annotations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10402792
Explore at:
Dataset updated
Dec 18, 2023
Dataset authored and provided by
Morrison, Max
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset corresponding to the ICASSP 2024 paper "Crowdsourced and Automatic Speech Prominence Estimation" [link]

This dataset is useful for training machine learning models to perform automatic emphasis annotaiton, as well as downstream tasks such as emphasis-controlled TTS, emotion recognition, and text summarization. The dataset is described in Section 3 (Emphasis Annotation Dataset). The contents of this section are copied below for convenience.

We used our crowdsourced annotation system to perform human annotation on one eighth of the train-clean-100 partition of the LibriTTS [1] dataset. Specifically, participants annotated 3,626 utterances with a total length of 6.42 hours and 69,809 words from 18 speakers (9 male and 9 female). We collected at least one annotation of all 3,626 utterances, at least two annotations of 2,259 of those utterances, at least four annotations of 974 utterances, and at least eight annotations of 453 utterances. We did this in order to explore (in Section 6) whether it is more cost-effective to train a system on multiple annotations of fewer utterances or fewer annotations of more utterances. We paid 298 annotators to annotate batches of 20 utterances, where each batch takes approximately 15 minutes. We paid $3.34 for each completed batch (estimated $13.35 per hour). Annotators each annotated between one and six batches. We recruited on MTurk US residents with an approval rating of at least 99 and at least 1000 approved tasks. Today, microlabor platforms like MTurk are plagued by automated task-completion software agents (bots) that randomly fill out surveys. We filtered out bots by excluding annotations from an additional 107 annotators that marked more than 2/3 of words as emphasized in eight or more utterances of the 20 utterances in a batch. Annotators who fail the bot filter are blocked from performing further annotation. We also recorded participants' native country and language, but note these may be unreliable as many MTurk workers use VPNs to subvert IP region filters on MTurk [2].

The average Cohen Kappa score for annotators with at least one overlapping utterance is 0.226 (i.e., ``Fair'' agreement)---but not all annotators annotate the same utterances, and this overemphasizes pairs of annotators with low overlap. Therefore, we use a one-parameter logistic model (i.e., a Rasch model) computed via py-irt [3], which predicts heldout annotations from scores of overlapping annotators with 77.7% accuracy (50% is random).

The structure of this dataset is a single JSON file of word-aligned emphasis annotations. The JSON references file stems of the LibriTTS dataset, which can be found here. All code used in the creation of the dataset can be found here. The format of the JSON file is as follows.

{

{ "annotations": [ { "score": [ , , ... ], "stem": , "words": [ [ , ,

], [ , ,

], ... ] }, ... ], "country": , "language": }, ... }

[1] Zen et al., “LibriTTS: A corpus derived from LibriSpeech for text-to-speech,” in Interspeech, 2019.[2] Moss et al., “Bots or inattentive humans? Identifying sources of low-quality data in online platforms,” PsyArXiv preprint PsyArXiv:wr8ds, 2021.[3] John Patrick Lalor and Pedro Rodriguez, “py-irt: A scalable item response theory library for Python,” INFORMS Journal on Computing, 2023.
s
Dataset in support of the thesis 'Speech enhancement by using deep learning...
eprints.soton.ac.uk
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cui, Jianqiao; Bleeck, Stefan; Panayotov, Vassil; Chen, Guoguo; Povey, Daniel; Khudanpur, Sanjeev; Snyder, David (2024). Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms' [Dataset]. http://doi.org/10.5258/SOTON/D3161
Explore at:
Unique identifier
https://doi.org/10.5258/SOTON/D3161
Dataset updated
Jul 19, 2024
Dataset provided by
University of Southampton
Authors
Cui, Jianqiao; Bleeck, Stefan; Panayotov, Vassil; Chen, Guoguo; Povey, Daniel; Khudanpur, Sanjeev; Snyder, David
Description
The source code and audio datasets of my PhD project. 1. https://www.openslr.org/12 LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Acoustic models, trained on this data set, are available at kaldi-asr.org and language models, suitable for evaluation can be found at http://www.openslr.org/11/. For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015 2.https://www.openslr.org/17 MUSAN is a corpus of music, speech, and noise recordings. This work was supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1232825 and by Spoken Communications. You can cite the data using the following BibTeX entry: @misc{musan2015, author = {David Snyder and Guoguo Chen and Daniel Povey}, title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus}, year = {2015}, eprint = {1510.08484}, note = {arXiv:1510.08484v1} } 3. source_code.zip The program from parts of my PhD project. 4.SJ_EXP.zip The program of the subjective experiment corresponding to the last chapter.
h
librispeech_asr_devclean
huggingface.co
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saisamarth Taluri (2023). librispeech_asr_devclean [Dataset]. https://huggingface.co/datasets/saital/librispeech_asr_devclean
Explore at:
Dataset updated
Dec 1, 2023
Authors
Saisamarth Taluri
Description
LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.87
h
librispeech_clean_train_test
huggingface.co
Updated Jan 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Long (2024). librispeech_clean_train_test [Dataset]. https://huggingface.co/datasets/cjccl/librispeech_clean_train_test
Explore at:
Dataset updated
Jan 29, 2024
Authors
Cody Long
Description
LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.87
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenSLR (2024). librispeech_asr [Dataset]. https://huggingface.co/datasets/openslr/librispeech_asr

librispeech_asr

LibriSpeech

openslr/librispeech_asr

Explore at:

16 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 3, 2024

Dataset authored and provided by

OpenSLR

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for librispeech_asr

  Dataset Summary

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

  Supported Tasks and Leaderboards

automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic… See the full description on the dataset page: https://huggingface.co/datasets/openslr/librispeech_asr.

Clear search

Close search

Google apps

Main menu

librispeech_asr

librispeech

librispeech-spkid-corpus

Dataset

Contents

libris_clean_100

reborn-uasr_multilingual-librispeech-no-silence-100hr

librispeech-spkid-corpus

Translation Augmented LibriSpeech Corpus

librispeech-alignments

Crowdsourced LibriTTS Speech Prominence Annotations

Dataset in support of the thesis 'Speech enhancement by using deep learning...

librispeech_asr_devclean

librispeech_clean_train_test

librispeech_asrSee More Versions

LibriSpeech

openslr/librispeech_asr

librispeech_asr