56 datasets found
  1. a

    The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

    • academictorrents.com
    bittorrent
    Updated Dec 25, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2016). The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus [Dataset]. https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3
    Explore at:
    bittorrent(440207227)Available download formats
    Dataset updated
    Dec 25, 2016
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) Training and Test Data The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found

  2. t

    TIMIT - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TIMIT - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/timit
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set was used to report the results. It contains 192 utterances from 24 speakers, excluding the validation set.

  3. Z

    TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Salvi; Brian Hosler; Paolo Bestagini; Matthew C. Stamm; Stefano Tubaro (2022). TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6560158
    Explore at:
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    Drexel University, USA
    Politecnico di Milano, Italy
    Authors
    Davide Salvi; Brian Hosler; Paolo Bestagini; Matthew C. Stamm; Stefano Tubaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos (e.g., deepfakes where both the visual and audio contents can be counterfeited) that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors (systems that consider both audio and video components). This is due to the difficulty of developing them but also to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms.

    In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping (DTW) techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with DeepfakeTIMIT and VidTIMIT video datasets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both monomodal (i.e., audio) and multimodal (i.e., audio and video) conditions. This highlights the need for multimodal forensic detectors and more multimodal deepfake data.

    For the initial version of TIMIT-TTS v1.0

    Arxiv: https://arxiv.org/abs/2209.08000

    TIMIT-TTS Database v1.0: https://zenodo.org/record/6560159

  4. h

    Timit

    • huggingface.co
    Updated Feb 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Calva (2023). Timit [Dataset]. https://huggingface.co/datasets/kimetsu/Timit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2023
    Authors
    Carlos Calva
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    kimetsu/Timit dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. t

    TIMIT dataset - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TIMIT dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/timit-dataset
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated word-initially before a stressed vowel (e.g. in ["phIt] ‘pit’), except if a sibilant [s] precedes the stop (e.g. in ["spIt] ‘spit’).

  6. r

    Data from: TIMIT Acoustic-Phonetic Continuous Speech Corpus

    • resodate.org
    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. S. Garofolo; L. F. Lamel; W. M. Fisher; J. G. Fiscus; D. S. Pallett (2024). TIMIT Acoustic-Phonetic Continuous Speech Corpus [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdGltaXQtYWNvdXN0aWMtcGhvbmV0aWMtY29udGludW91cy1zcGVlY2gtY29ycHVz
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    J. S. Garofolo; L. F. Lamel; W. M. Fisher; J. G. Fiscus; D. S. Pallett
    Description

    The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers.

  7. r

    The QUT-NOISE Databases and Protocols

    • researchdata.edu.au
    Updated 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dean David; Sridharan Sridha; Vogt Robert; Mason Michael (2010). The QUT-NOISE Databases and Protocols [Dataset]. http://doi.org/10.4225/09/58819f7a21a21
    Explore at:
    Dataset updated
    2010
    Dataset provided by
    Queensland University of Technology
    Authors
    Dean David; Sridharan Sridha; Vogt Robert; Mason Michael
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The QUT-NOISE Databases and Protocols

    Overview

    This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. It also contains code to create the QUT-NOISE-SRE protocol on top of an existing speaker recognition evaluation database (such as NIST evaluations). Further information on the QUT-NOISE and QUT-NOISE-TIMIT databases is available in our paper:


    D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) , in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

    This paper is also available in the file: docs/Dean2010, The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithm.pdf, distributed with this database.

    Further information on the QUT-NOISE-SRE protocol is available in our paper:
    D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) . In Proceedings of Interspeech 2015, September, Dresden, Germany.

    Licensing

    The QUT-NOISE data itself is licensed CC-BY-SA, and the code required to create the QUT-NOISE-TIMIT database and QUT-NOISE-SRE protocols is licensed under the BSD license. Please consult the approriate LICENSE.txt files (in the code and QUT-NOISE directories) for more information. To attribute this database, please include the following citation:


    D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) , in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

    If your work is based upon the QUT-NOISE-SRE, please also include this citation:
    D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) . In Proceedings of Interspeech 2015, September, Dresden, Germany.

    Download and Installation

    Download the following QUT-NOISE*.zip files:

    • (26.7 MB, md5sum: 672461fd88782e9ea10d5c2cb7a84196)
    • (1.6 GB, md5sum: f87fb213c0e1c439e1b727fb258ef2cd)
    • (1.7 GB, md5sum: d680118b4517e1257a9263b99d1ac401)
    • (1.4 GB, md5sum: d99572ae1c118b749c1ffdb2e0cf0d2e)
    • (1.4 GB, md5sum: fe107ab341e6bc75de3a32c69344190e)
    • (1.6 GB, md5sum: 68d5ebc2e60cb07927cc4d33cdf2f017)

    Creating QUT-NOISE-TIMIT

    Obtaining TIMIT

    In order to construct the QUT-NOISE-TIMIT database from the QUT-NOISE data supplied here you will need to obtain a copy of the TIMIT database from the . If you just want to use the QUT-NOISE database, or you wish to combine it with different speech data, TIMIT is not required.

    Creating QUT-NOISE-TIMIT

    • Once you have obtained TIMIT, download and install a copy of and install it in your MATLABPATH.
    • Run matlab in the QUT-NOISE/code directory, and run the function: createQUTNOISETIMIT('/location/of/timit-cd/timit'). This will create the QUT-NOISE-TIMIT database in the QUT-NOISE/QUT-NOISE-TIMIT directory.
    • If you wish to verify that the QUT-NOISE-TIMIT database matches that evaluated in our original paper, please check that the md5sums (use md5sum on unix-based OSes) match those in the QUT-NOISE-TIMIT/md5sum.txt file.
    • Using the QUT-NOISE-SRE protocol
      • The code related to the QUT-NOISE-SRE protocol can be used in two ways:
        1. To create a collection of noisy audio files across the scenarios in the QUT-NOISE database at different noise levels, or,
        2. To recreate a list of file names based on the QUT-NOISE-SRE protocl produced by another researcher, having already done (1). This allows existing research to be reproduced without having to send large volumes of audio around.
      • If you are interested in creating your own noisy database from an existing SRE database (1 above), please look at the example script exampleQUTNOISESRE.sh in the QUT-NOISE/code directory. You will need to make some modifications, but it should give you the right idea.
      • If you are interested in creating our QUT-NOISE-NIST2008 database published at Interspeech 2015, you can find the list of created noisy files in the QUT-NOISE-NIST2008.train.short2.list and QUT-NOISE-NIST2008.test.short3.list files in the QUT-NOISE/code directory.
      • These files can be recreated as follows (provided you have access to the NIST2008 SRE data):

        Run matlab in the QUT-NOISE/code directory, and run the following functions:

        createQUTNOISESREfiles('NIST2008.train.short2.list', ...
        'QUT-NOISE-NIST2008.train.short2.list', ...
        '
  8. r

    TIMIT Corpus

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Williams; Yi Zhao; Erica Cooper; Junichi Yamagishi (2024). TIMIT Corpus [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdGltaXQtY29ycHVz
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Jennifer Williams; Yi Zhao; Erica Cooper; Junichi Yamagishi
    Description

    The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.

  9. TIMIT-corpus

    • kaggle.com
    zip
    Updated Nov 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLTK Data (2017). TIMIT-corpus [Dataset]. https://www.kaggle.com/nltkdata/timitcorpus
    Explore at:
    zip(22232888 bytes)Available download formats
    Dataset updated
    Nov 16, 2017
    Dataset authored and provided by
    NLTK Data
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    The canonical metadata on NLTK:

  10. r

    The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM

    • resodate.org
    • service.tib.eu
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John S. Garofolo; Lori F. Lamel; William M. Fisher; Jonathon G. Fiscus; David S. Pallett; Nancy L. Dahlgren (2025). The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdGhlLWRhcnBhLXRpbWl0LWFjb3VzdGljLXBob25ldGljLWNvbnRpbnVvdXMtc3BlZWNoLWNvcnB1cy1jZHJvbQ==
    Explore at:
    Dataset updated
    Jan 2, 2025
    Dataset provided by
    Leibniz Data Manager
    Authors
    John S. Garofolo; Lori F. Lamel; William M. Fisher; Jonathon G. Fiscus; David S. Pallett; Nancy L. Dahlgren
    Description

    The TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM is a widely used dataset for speech recognition tasks.

  11. t

    TCD-Timit - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TCD-Timit - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/tcd-timit
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    TCD-Timit is an audio-visual corpus of continuous speech.

  12. timit-corpus

    • kaggle.com
    zip
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommy NgX (2020). timit-corpus [Dataset]. https://www.kaggle.com/datasets/tommyngx/timit-corpus/code
    Explore at:
    zip(869007403 bytes)Available download formats
    Dataset updated
    Nov 24, 2020
    Authors
    Tommy NgX
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Tommy NgX

    Released under CC0: Public Domain

    Contents

    Speaker Verification Corpus

  13. n

    DARPA-TIMIT - Dataset - 國網中心Dataset平台

    • scidm.nchc.org.tw
    Updated Oct 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). DARPA-TIMIT - Dataset - 國網中心Dataset平台 [Dataset]. https://scidm.nchc.org.tw/dataset/darpa-timit
    Explore at:
    Dataset updated
    Oct 10, 2020
    Description

    The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found in the printed documentation which is also available from NTIS (NTIS# PB91-100354).

  14. VidTIMIT Audio-Video Dataset

    • zenodo.org
    • kaggle.com
    pdf, zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conrad Sanderson; Conrad Sanderson (2024). VidTIMIT Audio-Video Dataset [Dataset]. http://doi.org/10.5281/zenodo.158963
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Conrad Sanderson; Conrad Sanderson
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    See http://conradsanderson.id.au/vidtimit/ for details.

    Summary: Video and corresponding audio recordings of 43 people, reciting short sentences. Useful for research on topics such as automatic lip reading, multi-view face recognition, multi-modal speech recognition and person identification.

  15. DeepPredSpeech: computational models of predictive speech coding based on...

    • zenodo.org
    • data.europa.eu
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hueber; Thomas Hueber; Eric Tatulli; Laurent Girin; Jean-Luc Schwartz; Eric Tatulli; Laurent Girin; Jean-Luc Schwartz (2020). DeepPredSpeech: computational models of predictive speech coding based on deep learning [Dataset]. http://doi.org/10.5281/zenodo.3528068
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thomas Hueber; Thomas Hueber; Eric Tatulli; Laurent Girin; Jean-Luc Schwartz; Eric Tatulli; Laurent Girin; Jean-Luc Schwartz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all data, source code, pre-trained computational predictive models and experimental results related to:

    Hueber T., Tatulli E., Girin L., Schwatz, J-L "How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study." (biorXiv preprint https://doi.org/10.1101/471581).

    • Raw data are extracted from the publicly available database NTCD-TIMIT (10.5281/zenodo.260228).
      • Audio recordings are available in the audio_clean/ directory
      • Post-processed lip image sequences are available in the lips_roi/ directory (67x67 pixels, 8bits, obtained by lossless inverse DCT-2D transform from the DCT feature available in the original repository of NTCD-TIMIT)
      • Phonetic segmentation (extracted from NTCD-TIMIT original zenodo repository) is available in the HTK MLF file volunteer_labelfiles.mlf
    • Audio features (MFCC-spectrogram and log-spectrogram) are available in the mfcc_16k/ and fft_16k/ directories.
    • Models (audio-only, video-only and audiovisual, based on deep feed-forward neural networks and/or convolutional neural network, in .h5 format, trained with Keras 2.0 toolkit) and data normalization parameters (in .dat scikit-learn format) are available in models_mfcc/ and models_logspectro/ directories
    • Predicted and target (ground truth) MFCC-spectro (resp. log-spectro) for the test databases (1909 sentences), and for the different values of \(\tau_p\) or \(\tau_f\) are available in pred_testdb_mfccspectro/ (resp. pred_testdb_logspectro/) directory

    Source code for extracting audio features, training and evaluating the models is available on GitHub https://github.com/thueber/DeepPredSpeech/

    All directories have been zipped before upload.

    Feel free to contact me for more details.

    Thomas Hueber, Ph. D., CNRS research fellow, GIPSA-lab, Grenoble, France, thomas.hueber@gipsa-lab.fr

  16. Speech and Noise Corpora for Pitch Estimation of Human Speech

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastian Bechtold; Bastian Bechtold (2020). Speech and Noise Corpora for Pitch Estimation of Human Speech [Dataset]. http://doi.org/10.5281/zenodo.3920591
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bastian Bechtold; Bastian Bechtold
    Description

    This dataset contains common speech and noise corpora for evaluating fundamental frequency estimation algorithms as convenient JBOF dataframes. Each corpus is available freely on its own, and allows redistribution:

    These files are published as part of my dissertation, "Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods", and in support of the Replication Dataset for Fundamental Frequency Estimation.

    References:

    1. John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.
    2. Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.
    3. F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.
    4. Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.
    5. Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.
    6. John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.
    7. Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.
    8. David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.
  17. t

    QUT-NOISE-TIMIT - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). QUT-NOISE-TIMIT - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/qut-noise-timit
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The QUT-NOISE-TIMIT dataset is a dataset for speech enhancement. It consists of clean speech and noise.

  18. h

    timit

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quach Minh Tuan, timit [Dataset]. https://huggingface.co/datasets/nh0znoisung/timit
    Explore at:
    Authors
    Quach Minh Tuan
    Description

    nh0znoisung/timit dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. MIT & TIMIT Preprocess for Speaker Recognition

    • kaggle.com
    zip
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommy NgX (2020). MIT & TIMIT Preprocess for Speaker Recognition [Dataset]. https://www.kaggle.com/datasets/tommyngx/pre-mit-timit
    Explore at:
    zip(493659227 bytes)Available download formats
    Dataset updated
    Nov 24, 2020
    Authors
    Tommy NgX
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Tommy NgX

    Released under CC0: Public Domain

    Contents

  20. DeepfakeTIMIT

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, txt
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel (2021). DeepfakeTIMIT [Dataset]. http://doi.org/10.34777/s09v-3340
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    Apr 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    DeepfakeTIMIT is a database of videos where faces are swapped using the open source GAN-based approach (adapted from here: https://github.com/shaoanlu/faceswap-GAN), which, in turn, was developed from the original autoencoder-based Deepfake algorithm.

    When creating the database, we manually selected 16 similar looking pairs of people from publicly available VidTIMIT database. For each of 32 subjects, we trained two different models: a lower quality (LQ) with 64 x 64 input/output size model, and higher quality (HQ) with 128 x 128 size model (see the available images for the illustration). Since there are 10 videos per person in VidTIMIT database, we generated 320 videos corresponding to each version, resulting in 620 total videos with faces swapped. For the audio, we kept the original audio track of each video, i.e., no manipulation was done to the audio channel.

    Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of DeepfakeTIMIT must cite the following paper:

    P. Korshunov and S. Marcel,
    DeepFakes: a New Threat to Face Recognition? Assessment and Detection.
    arXiv and Idiap Research Report

    Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of VidTIMIT and subsequently DeepfakeTIMIT must also cite the following paper:

    C. Sanderson and B.C. Lovell,
    Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference.
    Lecture Notes in Computer Science (LNCS), Vol. 5558, pp. 199-208, 2009.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
None (2016). The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus [Dataset]. https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
bittorrent(440207227)Available download formats
Dataset updated
Dec 25, 2016
Authors
None
License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) Training and Test Data The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found

Search
Clear search
Close search
Google apps
Main menu