36 datasets found
  1. h

    libritts

    • huggingface.co
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mythic Infinity (2024). libritts [Dataset]. https://huggingface.co/datasets/mythicinfinity/libritts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 9, 2024
    Dataset authored and provided by
    Mythic Infinity
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for LibriTTS

    LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files from Project Gutenberg) of the LibriSpeech corpus.

      Overview
    

    This is the LibriTTS dataset, adapted… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/libritts.

  2. h

    libritts-aligned

    • huggingface.co
    Updated Mar 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Minixhofer (2024). libritts-aligned [Dataset]. https://huggingface.co/datasets/cdminix/libritts-aligned
    Explore at:
    Dataset updated
    Mar 9, 2024
    Authors
    Christoph Minixhofer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used for loading TTS spectrograms and waveform audio with alignments and a number of configurable "measures", which are extracted from the raw audio.

  3. h

    libritts_r

    • huggingface.co
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mythic Infinity (2024). libritts_r [Dataset]. https://huggingface.co/datasets/mythicinfinity/libritts_r
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2024
    Dataset authored and provided by
    Mythic Infinity
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for LibriTTS-R

    LibriTTS-R [1] is a sound quality improved version of the LibriTTS corpus (http://www.openslr.org/60/) which is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, published in 2019.

      Overview
    

    This is the LibriTTS-R dataset, adapted for the datasets library.

      Usage
    
    
    
    
    
      Splits
    

    There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/libritts_r.

  4. Libri TTS dev

    • kaggle.com
    Updated Nov 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luiz Felipe de Barros Jordão Costa (2020). Libri TTS dev [Dataset]. https://www.kaggle.com/luizfelipebjcosta/libri-tts-dev/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Luiz Felipe de Barros Jordão Costa
    Description

    This dataset is a subset of a minimal version of google's LibriTTS dataset, for more information on the LibriTTS dataset see this article. It's a minimal version because it contains only the text and audio files, that is, the basics you need to train a text-to-speech model. It's also only a subset, because kaggle has a size limit for the datasets to access the "full minimal dataset", see the list bellow: 1. Libri TTS train clean 100 (from the file train-clean-100 of the dataset) 2. Libri TTS train clean 360 part 1 (from the first half of the file train-clean-360) 3. Libri TTS train clean 360 part 2 (from the second part of the same file) 4. Libri TTS train other 500 part 1 (from the first part of the file train-other-500) 5. Libri TTS train other 500 part 2 (from the same file) 6. Libri TTS test (from the files test-clean and test-other) 7. Libri TTS dev (this dataset, from the files dev-clean and dev-other)

  5. LibriTTS

    • kaggle.com
    zip
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prateek Narain (2025). LibriTTS [Dataset]. https://www.kaggle.com/datasets/prateeknarain/libritts
    Explore at:
    zip(15443581764 bytes)Available download formats
    Dataset updated
    May 16, 2025
    Authors
    Prateek Narain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Prateek Narain

    Released under Apache 2.0

    Contents

  6. h

    LibriTTS-raw

    • huggingface.co
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Zain (2024). LibriTTS-raw [Dataset]. https://huggingface.co/datasets/azain/LibriTTS-raw
    Explore at:
    Dataset updated
    Apr 18, 2024
    Authors
    Ahmed Zain
    Description

    azain/LibriTTS-raw dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    libritts_r_tags_tagged_10k_generated

    • huggingface.co
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parler TTS (2024). libritts_r_tags_tagged_10k_generated [Dataset]. https://huggingface.co/datasets/parler-tts/libritts_r_tags_tagged_10k_generated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Parler TTS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Annotated LibriTTS-R

    This dataset is an annotated version of LibriTTS-R [1]. LibriTTS-R [1] is a sound quality improved version of the LibriTTS corpus which is a multi-speaker English corpus of approximately 960 hours of read English speech at 24kHz sampling rate, published in 2019. In the text_description column, it provides natural language annotations on the characteristics of speakers and utterances, that have been generated using the Data-Speech repository.… See the full description on the dataset page: https://huggingface.co/datasets/parler-tts/libritts_r_tags_tagged_10k_generated.

  8. h

    voices-libritts

    • huggingface.co
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SDialog (2025). voices-libritts [Dataset]. https://huggingface.co/datasets/sdialog/voices-libritts
    Explore at:
    Dataset updated
    Aug 2, 2025
    Dataset authored and provided by
    SDialog
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LibriTTS Speaker Voices & Embeddings

      Dataset Description
    

    This dataset provides a collection of speaker voice samples from the LibriTTS corpus. For each speaker, a single 30-second audio clip is provided, created by concatenating their speech segments. The dataset is designed for tasks such as speaker identification, speaker verification, and as a voice bank for Text-to-Speech (TTS) models, particularly for voice cloning. In addition to the audio files and their metadata… See the full description on the dataset page: https://huggingface.co/datasets/sdialog/voices-libritts.

  9. h

    LibriTTS-Enhanced

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    huanpm (2025). LibriTTS-Enhanced [Dataset]. https://huggingface.co/datasets/tong0/LibriTTS-Enhanced
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    huanpm
    Description

    LibriTTS Enhanced Dataset

    Enhanced version of LibriTTS dataset for speech enhancement research.

  10. o

    ESPnet2 pretrained model,...

    • explore.openaire.eu
    Updated Sep 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kan-Bayashi (2021). ESPnet2 pretrained model, kan-bayashi/libritts_tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave, fs=22050, lang=en [Dataset]. http://doi.org/10.5281/zenodo.5521416
    Explore at:
    Dataset updated
    Sep 22, 2021
    Authors
    Kan-Bayashi
    Description

    This model was trained by kan-bayashi using libritts/tts1 recipe in espnet. Python APISee https://github.com/espnet/espnet_model_zoo Evaluate in the recipegit clone https://github.com/espnet/espnet cd espnet git checkout 628b46282537ce532d613d6bafb75e826e8455de pip install -e . cd egs2/libritts/tts1 # Download the model file here ./run.sh --skip_data_prep false --skip_train true --download_model kan-bayashi/libritts_tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave Configconfig: ./conf/tuning/train_xvector_vits.yaml print_config: false log_level: INFO dry_run: false iterator_type: sequence output_dir: exp/tts_train_xvector_vits_raw_phn_tacotron_g2p_en_no_space ngpu: 1 seed: 777 num_workers: 4 num_att_plot: 3 dist_backend: nccl dist_init_method: env:// dist_world_size: 4 dist_rank: 0 local_rank: 0 dist_master_addr: localhost dist_master_port: 60056 dist_launcher: null multiprocessing_distributed: true unused_parameters: true sharded_ddp: false cudnn_enabled: true cudnn_benchmark: false cudnn_deterministic: false collect_stats: false write_collected_feats: false max_epoch: 100 patience: null val_scheduler_criterion: - valid - loss early_stopping_criterion: - valid - loss - min best_model_criterion: - - train - total_count - max keep_nbest_models: 10 grad_clip: -1 grad_clip_type: 2.0 grad_noise: false accum_grad: 1 no_forward_run: false resume: true train_dtype: float32 use_amp: false log_interval: 50 use_tensorboard: true use_wandb: false wandb_project: null wandb_id: null wandb_entity: null wandb_name: null wandb_model_log_interval: -1 detect_anomaly: false pretrain_path: null init_param: [] ignore_init_mismatch: false freeze_param: [] num_iters_per_epoch: 10000 batch_size: 20 valid_batch_size: null batch_bins: 5000000 valid_batch_bins: null train_shape_file: - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/text_shape.phn - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/speech_shape valid_shape_file: - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/text_shape.phn - exp/tts_stats_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/speech_shape batch_type: numel valid_batch_type: null fold_length: - 150 - 204800 sort_in_batch: descending sort_batch: descending multiple_iterator: false chunk_length: 500 chunk_shift_ratio: 0.5 num_cache_chunks: 1024 train_data_path_and_name_and_type: - - dump/22k/raw/train-clean-460/text - text - text - - dump/22k/raw/train-clean-460/wav.scp - speech - sound - - dump/22k/xvector/train-clean-460/xvector.scp - spembs - kaldi_ark valid_data_path_and_name_and_type: - - dump/22k/raw/dev-clean/text - text - text - - dump/22k/raw/dev-clean/wav.scp - speech - sound - - dump/22k/xvector/dev-clean/xvector.scp - spembs - kaldi_ark allow_variable_data_keys: false max_cache_size: 0.0 max_cache_fd: 32 valid_max_cache_size: null optim: adamw optim_conf: lr: 0.0002 betas: - 0.8 - 0.99 eps: 1.0e-09 weight_decay: 0.0 scheduler: exponentiallr scheduler_conf: gamma: 0.999875 optim2: adamw optim2_conf: lr: 0.0002 betas: - 0.8 - 0.99 eps: 1.0e-09 weight_decay: 0.0 scheduler2: exponentiallr scheduler2_conf: gamma: 0.999875 generator_first: false token_list: - - - AH0 - T - N - D - S - R - L - IH1 - DH - M - K - Z - EH1 - AE1 - IH0 - AH1 - W - ',' - HH - ER0 - P - IY1 - V - F - B - UW1 - AA1 - AY1 - AO1 - . - EY1 - IY0 - OW1 - NG - G - SH - Y - AW1 - CH - ER1 - UH1 - TH - JH - '''' - '?' - OW0 - EH2 - '!' - IH2 - OY1 - EY2 - AY2 - EH0 - UW0 - AA2 - AE2 - OW2 - AO2 - AE0 - AH2 - ZH - AA0 - UW2 - IY2 - AY0 - AO0 - AW2 - EY0 - UH2 - ER2 - AW0 - '...' - UH0 - OY2 - . . . - OY0 - . . . . - .. - . ... - . . - . . . . . - .. .. - '... .' - odim: null model_conf: {} use_preprocessor: true token_type: phn bpemodel: null non_linguistic_symbols: null cleaner: tacotron g2p: g2p_en_no_space feats_extract: linear_spectrogram feats_extract_conf: n_fft: 1024 hop_length: 256 win_length: null normalize: null normalize_conf: {} tts: vits tts_conf: generator_type: vits_generator generator_params: hidden_channels: 192 spks: -1 spk_embed_dim: 512 global_channels: 256 segment_size: 32 text_encoder_attention_heads: 2 text_encoder_ffn_expand: 4 text_encoder_blocks: 6 text_encoder_positionwise_layer_type: conv1d text_encoder_positionwise_conv_kernel_size: 3 text_encoder_positional_encoding_layer_type: rel_pos text_encoder_self_attention_layer_type: rel_selfattn text_encoder_activation_type: swish text_encoder_normalize_before: true text_encoder_dropout_rate: 0.1 text_encoder_positional_dropout_rate: 0.0 text_encoder_attention_dropout_rate: 0.1 use_macaron_style_in_text_encoder: true use_conformer_conv_in_text_encoder: false text_encoder_conformer_kernel_size: -1 decoder_kernel_size: 7 decoder_channels: 512 decoder_upsample_scales: - 8 - 8 - 2 - 2 decoder_upsample_kernel_sizes: - 16 - 16 - 4 - 4 decoder_resblock_kernel_sizes: - 3 - 7 - 11 decoder_resblock_dilations: - - 1 - 3 - 5 - - 1 - 3 - 5 - - 1 - 3 - 5 use_weight...

  11. h

    LibriTTS-358-samples

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Zain (2024). LibriTTS-358-samples [Dataset]. https://huggingface.co/datasets/azain/LibriTTS-358-samples
    Explore at:
    Dataset updated
    Sep 12, 2024
    Authors
    Ahmed Zain
    Description

    azain/LibriTTS-358-samples dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    libritts-r-text-tags-v4

    • huggingface.co
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoach Lacombe (2024). libritts-r-text-tags-v4 [Dataset]. https://huggingface.co/datasets/ylacombe/libritts-r-text-tags-v4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2024
    Authors
    Yoach Lacombe
    Description

    ylacombe/libritts-r-text-tags-v4 dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    libritts

    • huggingface.co
    Updated Sep 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    meraki (2024). libritts [Dataset]. https://huggingface.co/datasets/cmeraki/libritts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2024
    Authors
    meraki
    Description

    cmeraki/libritts dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    3-LibriTTS-sample

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Kumar Sharma (2024). 3-LibriTTS-sample [Dataset]. https://huggingface.co/datasets/Nikhil20Sharma/3-LibriTTS-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Nikhil Kumar Sharma
    Description

    Nikhil20Sharma/3-LibriTTS-sample dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    libritts-r-mimi

    • huggingface.co
    Updated Dec 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Keisling (2024). libritts-r-mimi [Dataset]. https://huggingface.co/datasets/jkeisling/libritts-r-mimi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2024
    Authors
    Jacob Keisling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LibriTTS-R Mimi encoding

    This dataset converts all audio in the dev.clean, test.clean, train.100 and train.360 splits of the LibriTTS-R dataset from waveforms to tokens in Kyutai's Mimi neural codec. These tokens are intended as targets for DualAR audio models, but also allow you to simply download all audio in ~50-100x less space, if you're comfortable decoding later on with rustymimi or Transformers. This does NOT contain the original audio, please use the regular LibriTTS-R for… See the full description on the dataset page: https://huggingface.co/datasets/jkeisling/libritts-r-mimi.

  16. h

    200-dialogues-voices-libritts

    • huggingface.co
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SDialog (2025). 200-dialogues-voices-libritts [Dataset]. https://huggingface.co/datasets/sdialog/200-dialogues-voices-libritts
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    SDialog
    Description

    200 dialogues generated using SDialog:

    ExpO0O5 > DoPaCo > 001 001: both roles use gemma3:27b-it-qat as LLM only doctor gets truncated '?'

    Split without persona overlapp: train set: doc 0-59 pat 0-119 dev set: doc 60 -79 pat 120 - 139 test set: doc 80 - 99 pat 140 - 199

      Audio Setup:
    

    Databased of voices build from LibriTTS dataset IndexTTS model for utterances generation dScaper for channels and metadata creation PyRoomAcoustics for spacialization of the audio

  17. h

    libritts-r-filtered-speaker-descriptions

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ragman Teodora, libritts-r-filtered-speaker-descriptions [Dataset]. https://huggingface.co/datasets/TeodoraR/libritts-r-filtered-speaker-descriptions
    Explore at:
    Authors
    Ragman Teodora
    Description

    TeodoraR/libritts-r-filtered-speaker-descriptions dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    LibriTTS-dev-clean-16khz-mono-loudnorm-100-random-samples-2024-04-18-17-34-39-similarities...

    • huggingface.co
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Zain (2024). LibriTTS-dev-clean-16khz-mono-loudnorm-100-random-samples-2024-04-18-17-34-39-similarities [Dataset]. https://huggingface.co/datasets/azain/LibriTTS-dev-clean-16khz-mono-loudnorm-100-random-samples-2024-04-18-17-34-39-similarities
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2024
    Authors
    Ahmed Zain
    Description

    azain/LibriTTS-dev-clean-16khz-mono-loudnorm-100-random-samples-2024-04-18-17-34-39-similarities dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    libritts-r-mhubert-2000units

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryota Komatsu, libritts-r-mhubert-2000units [Dataset]. https://huggingface.co/datasets/ryota-komatsu/libritts-r-mhubert-2000units
    Explore at:
    Authors
    Ryota Komatsu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ryota-komatsu/libritts-r-mhubert-2000units dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    libritts-r-test-clean

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaehoon Kang, libritts-r-test-clean [Dataset]. https://huggingface.co/datasets/morateng/libritts-r-test-clean
    Explore at:
    Authors
    Jaehoon Kang
    Description

    morateng/libritts-r-test-clean dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mythic Infinity (2024). libritts [Dataset]. https://huggingface.co/datasets/mythicinfinity/libritts

libritts

mythicinfinity/libritts

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2024
Dataset authored and provided by
Mythic Infinity
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for LibriTTS

LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files from Project Gutenberg) of the LibriSpeech corpus.

  Overview

This is the LibriTTS dataset, adapted… See the full description on the dataset page: https://huggingface.co/datasets/mythicinfinity/libritts.

Search
Clear search
Close search
Google apps
Main menu