34 datasets found
  1. T

    vctk

    • tensorflow.org
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). vctk [Dataset]. http://doi.org/10.7488/ds/2645
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

    Note that the 'p315' text was lost due to a hard disk error.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('vctk', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. h

    VCTK

    • huggingface.co
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vedat Baday (2024). VCTK [Dataset]. https://huggingface.co/datasets/badayvedat/VCTK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Authors
    Vedat Baday
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    badayvedat/VCTK dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. O

    VCTK (CSTR VCTK Corpus)

    • opendatalab.com
    zip
    Updated Dec 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2022). VCTK (CSTR VCTK Corpus) [Dataset]. https://opendatalab.com/OpenDataLab/VCTK
    Explore at:
    zip(15224193249 bytes)Available download formats
    Dataset updated
    Dec 23, 2022
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive. The newspaper texts were taken from Herald Glasgow, with permission from Herald & Times Group. Each speaker has a different set of the newspaper texts selected based a greedy algorithm that increases the contextual and phonetic coverage. The details of the text selection algorithms are described in the following paper: C. Veaux, J. Yamagishi and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database," https://doi.org/10.1109/ICSDA.2013.6709856. The rainbow passage and elicitation paragraph are the same for all speakers. The rainbow passage can be found at International Dialects of English Archive: (http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation paragraph is identical to the one used for the speech accent archive (http://accent.gmu.edu). The details of the the speech accent archive can be found at http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf. All speech data was recorded using an identical recording setup: an omni-directional microphone (DPA 4035) and a small diaphragm condenser microphone with very wide bandwidth (Sennheiser MKH 800), 96kHz sampling frequency at 24 bits and in a hemi-anechoic chamber of the University of Edinburgh. (However, two speakers, p280 and p315 had technical issues of the audio recordings using MKH 800). All recordings were converted into 16 bits, were downsampled to 48 kHz, and were manually end-pointed.

  4. speech-to-text-wavenet VCTK training checkpoint

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Baumann (2023). speech-to-text-wavenet VCTK training checkpoint [Dataset]. http://doi.org/10.6084/m9.figshare.4555483.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ryan Baumann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a training checkpoint for speech-to-text-wavenet trained against the VCTK dataset. Training terminated after 20 epochs with a loss of 8.72. Training batch size was reduced from 16 to 4, and revision 91758811f of speech-to-text-wavenet was used, with tensorflow 0.12.1 and sugartensor 0.0.2.3. Training was run in Python 2.7.13 on OS X 10.11.6 with CUDA 7.5 on an NVIDIA GeForce GTX 780M.

  5. h

    vctk

    • huggingface.co
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanchit Gandhi (2024). vctk [Dataset]. https://huggingface.co/datasets/sanchit-gandhi/vctk
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2024
    Authors
    Sanchit Gandhi
    Description

    Dataset Card for "vctk"

    More Information needed

  6. VCTK Corpus (v0.92)

    • kaggle.com
    Updated Aug 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Desty Nova (2023). VCTK Corpus (v0.92) [Dataset]. https://www.kaggle.com/datasets/destynova/vctk-corpus-092
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Desty Nova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Desty Nova

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  7. h

    vctk

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Paulsen, vctk [Dataset]. https://huggingface.co/datasets/jspaulsen/vctk
    Explore at:
    Authors
    Jacob Paulsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VCTK

    This is a processed clone of the VCTK dataset with leading and trailing silence removed using Silero VAD. A fixed 25 ms of padding has been added to both ends of each audio clip to (hopefully) imrprove training and finetuning. The original dataset is available at: https://datashare.ed.ac.uk/handle/10283/3443.

      Reproducing
    

    This repository notably lacks a requirements.txt file. There's likely a missing dependency or two, but roughly: pydub tqdm torch torchaudio… See the full description on the dataset page: https://huggingface.co/datasets/jspaulsen/vctk.

  8. Z

    Data from: PodcastMix - a dataset for separating music and speech in...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Schmidt (2022). PodcastMix - a dataset for separating music and speech in podcasts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5552352
    Explore at:
    Dataset updated
    Sep 12, 2022
    Dataset provided by
    Marius Miron
    Nicolas Schmidt
    Jordi Pons
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. It contains audio files at 44.1kHz and the corresponding metadata. For further details check the following paper and the associated GitHub repository:

    N. Schmidt, J. Pons, M. Miron, "Score-informed source separation for multi-channel orchestral recordings", submitted to ICASSP (2022)

    https://github.com/MTG/Podcastmix

    This dataset contains four parts (we highlight the content of the zenodo archives within brackets]:

    [metadata] PodcastMix-synth train: large and diverse training set that is programatically generated (with a validation partition). The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.

    [metadata] PodcastMix-synth test a programatically generated test set with reference stems to compute evaluation metrics. The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.

    [audio and metadata] PodcastMix-real with-reference : a test set with real podcasts with reference stems to compute evaluation metrics. The podcasts are recorded by one of the authors and the source of the music is the FMA dataset.

    [audio and metadata] PodcastMix-real no-reference: a test set with real podcasts with only the podcasts mixes for subjective evaluation. The podcasts are compiled from the internet.

    This dataset is created by Nicolas Schmidt, Marius Miron, Music Technology Group - Universitat Pompeu Fabra (Barcelona) and Jordi Pons - Dolby Labs. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License (CC BY-SA 4.0).

    Please acknowledge PodcastMix in Academic Research. When the present dataset is used for academic research, we would highly appreciate if authors quote the following publication:

    N. Schmidt, J. Pons, M. Miron, "Score-informed source separation for multi-channel orchestral recordings", submitted to ICASSP (2022)

    The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the UPF is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the dataset or any part of it.

    PURPOSES. The data is processed for the general purpose of carrying out research development and innovation studies, works or projects. In particular, but without limitation, the data is processed for the purpose of communicating with Licensee regarding any administrative and legal / judicial purposes.

  9. O

    DR-VCTK (Device Recorded VCTK)

    • opendatalab.com
    zip
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Informatics (2022). DR-VCTK (Device Recorded VCTK) [Dataset]. https://opendatalab.com/OpenDataLab/DR-VCTK
    Explore at:
    zip(1794915001 bytes)Available download formats
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    National Institute of Informatics
    Ă–zyeÄźin University
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a new variant of the voice cloning toolkit (VCTK) dataset: device-recorded VCTK (DR-VCTK), where the high-quality speech signals recorded in a semi-anechoic chamber using professional audio devices are played back and re-recorded in office environments using relatively inexpensive consumer devices.

  10. vctk data

    • kaggle.com
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Dmitrieva (2024). vctk data [Dataset]. https://www.kaggle.com/datasets/katedmitrieva/vctk-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kate Dmitrieva
    Description

    Dataset

    This dataset was created by Kate Dmitrieva

    Contents

  11. E

    ## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for...

    • find.data.gov.scot
    • dtechtive.com
    • +1more
    txt, zip
    Updated Mar 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR) (2016). ## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models [Dataset]. http://doi.org/10.7488/ds/1356
    Explore at:
    zip(147.1 MB), zip(821.6 MB), txt(0.0166 MB), zip(5.934 MB), zip(912.7 MB), zip(162.6 MB), zip(0.3533 MB)Available download formats
    Dataset updated
    Mar 22, 2016
    Dataset provided by
    University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERSEDED: THIS DATASET HAS BEEN REPLACED by the one which can be found at https://doi.org/10.7488/ds/2117. ## Clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the paper associated with the database. Some of the noises were obtained from the Demand database, available here: http://parole.loria.fr/DEMAND/ The speech database was obtained from the Voice Banking Corpus, available here: http://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz

  12. h

    vctk

    • huggingface.co
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifan Yang (2025). vctk [Dataset]. https://huggingface.co/datasets/yfyeung/vctk
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2025
    Authors
    Yifan Yang
    Description

    yfyeung/vctk dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. VCTK-files

    • kaggle.com
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smeu Stefan (2022). VCTK-files [Dataset]. https://www.kaggle.com/datasets/smeustefan/vctkfiles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2022
    Dataset provided by
    Kaggle
    Authors
    Smeu Stefan
    Description

    Dataset

    This dataset was created by Smeu Stefan

    Contents

  14. E

    Test dataset for separation of speech, traffic sounds, wind noise, and...

    • live.european-language-grid.eu
    audio wav
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Test dataset for separation of speech, traffic sounds, wind noise, and general sounds [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7681
    Explore at:
    audio wavAvailable download formats
    Dataset updated
    Apr 24, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was generated as part of the paper:DCUnet-Based Multi-Model Approach for Universal Sound Separation,K. Arendt, A. Szumaczuk, B. Jasik, K. Piaskowski, P. Masztalski, M. Matuszewski, K. Nowicki, P. Zborowski.It contains various sounds from the Audio Set [1] and spoken utterances from VCTK [2] and DNS [3] datasets.Contents:sr_8k/ mix_clean/ s1/ s2/ s3/ s4/sr_16k/ mix_clean/ s1/ s2/ s3/ s4/sr_48k/ mix_clean/ s1/ s2/ s3/ s4/Each directory contains 512 audio samples in different sampling rate (sr_8k - 8 kHz, sr_16k - 16 kHz, sr_48k - 48 kHz).The audio samples for each sampling rate are different as they were generated randomly and separately.Each directory contains 5 subdirectories:- mix_clean - mixed sources,- s1 - source #1 (general sounds),- s2 - source #2 (speech),- s3 - source #3 (traffic sounds),- s4 - source #4 (wind noise).The sound mixtures were generated by adding s2, s3, s4 to s1 with SNR ranging from -10 to 10 dB w.r.t. s1.REFERENCES:[1] Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in Proc. IEEE ICASSP 2017, New Orleans, LA, 2017.[2] Christophe Veaux, Junichi Yamagishi, and Kirsten Mac- Donald, “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit, [sound],” https://doi.org/10.7488/ds/1994, University of Edinburgh. The Centre for Speech Technology Research (CSTR). 2017.[3] Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, and Johannes Gehrke, “The interspeech 2020 deep noise suppression challenge: Datasets, subjective speech quality and testing framework,” 2020.

  15. Processed VCTK

    • kaggle.com
    Updated Mar 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandr Ivanov (2021). Processed VCTK [Dataset]. https://www.kaggle.com/alexandrivanov13/processed-vctk/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexandr Ivanov
    Description

    Dataset

    This dataset was created by Alexandr Ivanov

    Contents

  16. h

    vctk

    • huggingface.co
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruhullah Shaikh (2025). vctk [Dataset]. https://huggingface.co/datasets/ruhullah1/vctk
    Explore at:
    Dataset updated
    May 13, 2025
    Authors
    Ruhullah Shaikh
    Description

    ruhullah1/vctk dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. Wilska Microphone Enhancement Dataset

    • zenodo.org
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esteban Andrés Gómez Mellado; Esteban Andrés Gómez Mellado; Tom Bäckström; Tom Bäckström (2025). Wilska Microphone Enhancement Dataset [Dataset]. http://doi.org/10.5281/zenodo.15330652
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Esteban Andrés Gómez Mellado; Esteban Andrés Gómez Mellado; Tom Bäckström; Tom Bäckström
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Wilska Microphone Enhancement Dataset

    A dataset of parallel fullband speech simultaneously recorded by various microphones in the Wilska Multichannel Anechoic Chamber at the Acoustics Lab of Aalto University, Espoo, Finland.

    The source data comes from the VCTK dataset. It was played back using a high quality professional fullrange loudspeaker and recorded by two headset microphones and six studio microphones. The total amount of data per microphone is 23 hours, 11 minutes and 13 seconds. All data was recorded at 48kHz and 24-bit, and later on compressed in .flac format (loseless compression).

    Date structure

    The data is structured as follows:

    • headset-mics: Folder containing headset microphone recordings. Each folder contains data recorded by a single microphone.
    • studio-mics: Folder containing studio microphone recordings. Each folder contains data recorded by a single microphone.

    Each folder contains the same number of files. Files with the same name across folders are parallel recordings, meaning that they have identical content and the same number of samples, but captured by each corresponding microphone.

    Each file uses the following naming convention:

    pXXX_YYY_ZZ.flac

    where:

    • pXXX is the speaker ID, matching exactly the IDs used in the VCTK dataset.
    • YYY is the utterance ID, also consistent with the VCTK dataset.
    • ZZ is the segment ID. Utterances were trimmed to remove silent sections, and some were split into multiple segments as a result (e.g., p225_009_00.flac and p225_009_01.flac).

    Each folder contains a meta.csv file with metadata for its corresponding subset, including a SHA256 hash to verify data integrity. These were generated using sndls with the following command:

    sndls audio --csv meta.csv -e .flac -r --sha256

  18. VCTK 22050HZ DENOISED TRIMMED SILENCE NORMALIZED

    • kaggle.com
    Updated Jul 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KC La (2022). VCTK 22050HZ DENOISED TRIMMED SILENCE NORMALIZED [Dataset]. https://www.kaggle.com/datasets/kcla1100/vctk-22050/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    KC La
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by KC La

    Released under CC0: Public Domain

    Contents

  19. h

    vctk-full

    • huggingface.co
    Updated Aug 15, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AudioConFit (2012). vctk-full [Dataset]. https://huggingface.co/datasets/confit/vctk-full
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2012
    Dataset authored and provided by
    AudioConFit
    Description

    confit/vctk-full dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. Z

    Creating speech zones with self-distributing acoustic swarms (Simulated +...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, Tuochao (2023). Creating speech zones with self-distributing acoustic swarms (Simulated + Clutter) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8219719
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Chen, Tuochao
    Itani, Malek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets used in the paper: "Creating speech zones with self-distributing acoustic swarms"

    This deposit contains 2 distinct datasets:

    A dataset of speech mixtures containing 2-5 speakers simulated using PyRoomAcoustics. The dataset consists of 8000 training mixtures, 500 validation mixtures and 1000 testing mixtures.

    A dataset of speech mixtures containing 3-5 speakers created from synchronized recordings in reverberant rooms with objects cluttering the table. The dataset consists of 500 testing mixtures.

    The source sounds are various utterances from the VCTK dataset. For real world data, the utterances are played over a Rokono Bass+ Mini Speaker. The recordings are captured from an array of 7 microphones, as they are recorded by our robotic swarm as it is distributed across the table. The recorded audio in the real world has been subjected to audio compression and decompression using the Opus Codec to enable multiple simultaneous streams.

    Please see the Readme for more infromation. Please see related identifiers for other datasets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). vctk [Dataset]. http://doi.org/10.7488/ds/2645

vctk

Explore at:
Dataset updated
Dec 6, 2022
Description

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

Note that the 'p315' text was lost due to a hard disk error.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('vctk', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu