44 datasets found

a
TIMIT
datasets.activeloop.ai
deeplake
Updated Mar 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, Victor Zue (2022). TIMIT [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/timit-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Mar 24, 2022
Authors
John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, Victor Zue
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Description
The TIMIT dataset is a corpus of read speech. It consists of recordings of 630 speakers, each reading 10 phonetically-balanced sentences. The dataset is divided into a training set of 462 speakers and a test set of 168 speakers.
DARPA TIMIT Acoustic-Phonetic Continuous Speech
kaggle.com
zip
Updated Jun 5, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Fekadu (2019). DARPA TIMIT Acoustic-Phonetic Continuous Speech [Dataset]. https://www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech
Explore at:
zip(869007403 bytes)Available download formats
Dataset updated
Jun 5, 2019
Authors
Michael Fekadu
Description
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Special Thanks: https://github.com/philipperemy/timit/edit/master/README.md

Download Link (Free, provided by AcademicTorrents): https://goo.gl/l0sPwz

Type: Dataset

Abstract: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) Training and Test Data

The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found in the printed documentation which is also available from NTIS (NTIS# PB91-100354).

Corpus Speaker Distribution

TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States. Table 1 shows the number of speakers for the 8 dialect regions, broken down by sex. The percentages are given in parentheses. A speaker's dialect region is the geographical area of the U.S. where they lived during their childhood years. The geographical areas correspond with recognized dialect regions in U.S. (Language Files, Ohio State University Linguistics Dept., 1982), with the exception of the Western region (dr7) in which dialect boundaries are not known with any confidence and dialect region 8 where the speakers moved around a lot during their childhood.

Table 1: Dialect distribution of speakers Dialect Region(dr) #Male #Female Total ---------- --------- --------- ---------- 1 31 (63%) 18 (27%) 49 (8%) 2 71 (70%) 31 (30%) 102 (16%) 3 79 (67%) 23 (23%) 102 (16%) 4 69 (69%) 31 (31%) 100 (16%) 5 62 (63%) 36 (37%) 98 (16%) 6 30 (65%) 16 (35%) 46 (7%) 7 74 (74%) 26 (26%) 100 (16%) 8 22 (67%) 11 (33%) 33 (5%) ------ --------- --------- ---------- 8 438 (70%) 192 (30%) 630 (100%) The dialect regions are: dr1: New England dr2: Northern dr3: North Midland dr4: South Midland dr5: Southern dr6: New York City dr7: Western dr8: Army Brat (moved around)

Corpus Text Material

The text material in the TIMIT prompts (found in the file "prompts.doc") consists of 2 dialect "shibboleth" sentences designed at SRI, 450 phonetically-compact sentences designed at MIT, and 1890 phonetically-diverse sentences selected at TI. The dialect sentences (the SA sentences) were meant to expose the dialectal variants of the speakers and were read by all 630 speakers. The phonetically-compact sentences were designed to provide a good coverage of pairs of phones, with extra occurrences of phonetic contexts thought to be either difficult or of particular interest. Each speaker read 5 of these sentences (the SX sentences) and each text was spoken by 7 different speakers. The phonetically-diverse sentences (the SI sentences) were selected from existing text sources - the Brown Corpus (Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et al., 1964) - so as to add diversity in sentence types and phonetic contexts. The selection criteria maximized the variety of allophonic contexts found in the texts. Each speaker read 3 of these sentences, with each sentence being read only by a single speaker. Table 2 summarizes the speech material in TIMIT.

Table 2: TIMIT speech material Sentence Type #Sentences #Speakers Total #Sentences/Speaker ------------- ---------- --------- ----- ------------------ Dialect (SA) 2 630 1260 2 Compact (SX) 450 7 3150 5 Diverse (SI) 1890 1 1890 3 ------------- ---------- --------- ----- ---------------- Total 2342 6300 10

Suggested Training/Test Subdivision

The speech material has been subdivided into portions for training and testing. The criteria for the subdivision is described in the file "testset.doc". THIS SUBDIVISION HAS NO RELATION TO THE DATA DISTRIBUTED ON THE PROTOTYPE VERSION OF THE CDROM.

Core Test Set:

The test data has a core portion containing 24 speakers, 2 male and 1 female from each dialect region. The core test speakers are shown in Table 3. Each speaker read a different set of SX sentences. Thus the core test material contains 192 sentences, 5 SX and 3 SI for each speaker, each having a distinct text prompt.

Table 3: The core test set of 24 speakers Dialect Male Female ------- ------ ------ 1 DAB0, WBT0 ELC0 2 TAS1, WEW0 PAS0 3 JMP0, LNT0 PKT0 4 LLL0, TLS0 JLM0 5 BPM0, KLT0 NLP0 6 CMJ0, JDH0 MGD0 7 GRT0, NJM0 DHC0 8 JLN0, PAM0 MLD0

Complete Test Set:

A more extensive test set was obtained by including the sentences from all speakers that read any of the SX texts included in the core test set. In doing so, no sentence text appears in both the training and test sets. This complete test set contains a total of 168 speakers and 1344 utterances, accounting for about 27% of the total speech material. The resulting dialect distribution of the 168 speaker test set is given in Table 4. The complete test material contains 624 distinct texts. ``` Table 4: Dialect distribution for complete test set

Dialect #Male #Female Total ------- ----- ------- ----- 1 7 4 11 2 18 8 26 3 23 3 26 4 16 16 32 5 17 11 28 6 8 3 11 7 15 8 23 8 8 3 11 ----- ----- ------- ------ Total 112 56 168

CDROM TIMIT Directory and File Structure The speech and associated data is organized on the CD-ROM according to the following hierarchy: /
h
timit
huggingface.co
Updated Feb 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AudioConFit (2024). timit [Dataset]. https://huggingface.co/datasets/confit/timit
Explore at:
Dataset updated
Feb 18, 2024
Dataset authored and provided by
AudioConFit
Description
The TIMIT corpus of reading speech has been developed to provide speech data for acoustic-phonetic research studies and for the evaluation of automatic speech recognition systems. TIMIT contains high quality recordings of 630 individuals/speakers with 8 different American English dialects, with each individual reading upto 10 phonetically rich sentences. More info on TIMIT dataset can be understood from the "README" which can be found here: https://catalog.ldc.upenn.edu/docs/LDC93S1/readme.txt
Z
TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection
data.niaid.nih.gov
zenodo.org
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew C. Stamm (2022). TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6560158
Explore at:
Dataset updated
Sep 21, 2022
Dataset provided by
Davide Salvi
Brian Hosler
Paolo Bestagini
Stefano Tubaro
Matthew C. Stamm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos (e.g., deepfakes where both the visual and audio contents can be counterfeited) that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors (systems that consider both audio and video components). This is due to the difficulty of developing them but also to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms.

In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping (DTW) techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with DeepfakeTIMIT and VidTIMIT video datasets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both monomodal (i.e., audio) and multimodal (i.e., audio and video) conditions. This highlights the need for multimodal forensic detectors and more multimodal deepfake data.

For the initial version of TIMIT-TTS v1.0

Arxiv: https://arxiv.org/abs/2209.08000

TIMIT-TTS Database v1.0: https://zenodo.org/record/6560159
A
Data from: TIMIT Acoustic-Phonetic Continuous Speech Corpus
abacus.library.ubc.ca
iso, txt
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2020). TIMIT Acoustic-Phonetic Continuous Speech Corpus [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=48b767344a6d1db0c234a88a817f?persistentId=hdl%3A11272.1%2FAB2%2FSWVENO&version=&q=&fileTypeGroupFacet=%22Text%22&fileAccess=
Explore at:
iso(679888896), txt(1308)Available download formats
Dataset updated
Aug 30, 2020
Dataset provided by
Abacus Data Network
Area covered
United States
Description
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
A
STC-TIMIT 1.0
abacus.library.ubc.ca
iso, txt
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2020). STC-TIMIT 1.0 [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=a1c0767e7bea75d3c44577d9825f?persistentId=hdl%3A11272.1%2FAB2%2FXWIVLE&version=&q=&fileTypeGroupFacet=%22Other%22&fileAccess=
Explore at:
txt(1308), iso(868220928)Available download formats
Dataset updated
Aug 30, 2020
Dataset provided by
Abacus Data Network
Time period covered
2007 - 2008
Area covered
United States
Description
This file contains documentation for STC-TIMIT 1.0, Linguistic Data Consortium (LDC) catalog number LDC2008S03 and isbn 1-58563-468-9. STC-TIMIT 1.0 is a telephone version of TIMIT Acoustic Phonetic Continuous Speech Corpus, LDC93S1 (TIMIT). TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English reading ten phonetically rich sentences. Created in 1993, TIMIT was designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. Since that time, several corpora have been developed using the TIMIT database: NTIMIT, LDC93S2 (transmiting TIMIT recordings through a telephone handset and over various channels in the NYNEX telephone network and redigitizing them); CTIMIT, LDC96S30 (passing TIMIT files through cellular telephone circuits); FFMTIMIT, LDC96S32 (re-recording TIMIT files with a free-field microphone); and HTIMIT, LDC98S67 (re-recording a subset of TIMIT files through different telephone handsets). What differentiates STC-TIMIT 1.0 from other TIMIT-derived corpora is that the entire TIMIT database was passed through an actual telephone channel in a single call. Thus, a single type of channel distortion and noise affect the whole database. The process was managed using a Dialogic switchboard for the calling and receiving ends. No transducer (microphone) was employed; the original digital signal was converted to analog using the switchboard's A/D converter, transmitted trough a telephone channel and converted back to digital format before recording. As a result, the only distortion introduced is that of the telephone channel itself. The STC-TIMIT 1.0 database is organized in the same manner as in the original TIMIT corpus: 4620 files belonging to the training partition and 1680 files belonging to the test partition. Files were recorded using 8kHz sampling frequency and muLaw encoding. Additionally four sets of two calibration tones were generated. These were passed through the telephone line approximately at the start of every 1/4th of the whole database (both the source and recorded calibration tones in each set are provided). Calibration tones are: 2 sec. 1kHz tone 2 sec. sweep tone from 10 Hz to 4000 Hz. Utterances in STC-TIMIT 1.0 are time-aligned with those of TIMIT with an average precision of 0.125 ms (1 sample), by maximizing the cross-correlation between pairs of files from each corpus. Thus, labels from TIMIT may be used for STC-TIMIT 1.0, and the effects of telephone channels may be studied on a frame-by-frame basis.
q
The QUT-NOISE Databases and Protocols
researchdatafinder.qut.edu.au
Updated Jul 22, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Simon Denman (2016). The QUT-NOISE Databases and Protocols [Dataset]. https://researchdatafinder.qut.edu.au/individual/n442
Explore at:
Dataset updated
Jul 22, 2016
Dataset provided by
Queensland University of Technology (QUT)
Authors
Dr Simon Denman
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The QUT-NOISE Databases and Protocols

Overview

This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. It also contains code to create the QUT-NOISE-SRE protocol on top of an existing speaker recognition evaluation database (such as NIST evaluations). Further information on the QUT-NOISE and QUT-NOISE-TIMIT databases is available in our paper:

D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

This paper is also available in the file: docs/Dean2010, The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithm.pdf, distributed with this database.

Further information on the QUT-NOISE-SRE protocol is available in our paper: D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition. In Proceedings of Interspeech 2015, September, Dresden, Germany.

Licensing

The QUT-NOISE data itself is licensed CC-BY-SA, and the code required to create the QUT-NOISE-TIMIT database and QUT-NOISE-SRE protocols is licensed under the BSD license. Please consult the approriate LICENSE.txt files (in the code and QUT-NOISE directories) for more information. To attribute this database, please include the following citation:

D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

If your work is based upon the QUT-NOISE-SRE, please also include this citation: D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition. In Proceedings of Interspeech 2015, September, Dresden, Germany.

Download and Installation

Download the following QUT-NOISE*.zip files:

QUT_NOISE.zip (26.7 MB, md5sum: 672461fd88782e9ea10d5c2cb7a84196) QUT_NOISE_CAFE.zip (1.6 GB, md5sum: f87fb213c0e1c439e1b727fb258ef2cd) QUT_NOISE_CAR.zip (1.7 GB, md5sum: d680118b4517e1257a9263b99d1ac401) QUT_NOISE_HOME.zip (1.4 GB, md5sum: d99572ae1c118b749c1ffdb2e0cf0d2e) QUT_NOISE_REVERB.zip (1.4 GB, md5sum: fe107ab341e6bc75de3a32c69344190e) QUT_NOISE_STREET.zip (1.6 GB, md5sum: 68d5ebc2e60cb07927cc4d33cdf2f017)

Creating QUT-NOISE-TIMIT

Obtaining TIMIT

In order to construct the QUT-NOISE-TIMIT database from the QUT-NOISE data supplied here you will need to obtain a copy of the TIMIT database from the Linguistic Data Consortium. If you just want to use the QUT-NOISE database, or you wish to combine it with different speech data, TIMIT is not required.

Creating QUT-NOISE-TIMIT

Once you have obtained TIMIT, download and install a copy of VOICEBOX: Speech Processing Toolbox for MATLAB and install it in your MATLABPATH. Run matlab in the QUT-NOISE/code directory, and run the function: createQUTNOISETIMIT('/location/of/timit-cd/timit'). This will create the QUT-NOISE-TIMIT database in the QUT-NOISE/QUT-NOISE-TIMIT directory. If you wish to verify that the QUT-NOISE-TIMIT database matches that evaluated in our original paper, please check that the md5sums (use md5sum on unix-based OSes) match those in the QUT-NOISE-TIMIT/md5sum.txt file. Using the QUT-NOISE-SRE protocol The code related to the QUT-NOISE-SRE protocol can be used in two ways: To create a collection of noisy audio files across the scenarios in the QUT-NOISE database at different noise levels, or, To recreate a list of file names based on the QUT-NOISE-SRE protocl produced by another researcher, having already done (1). This allows existing research to be reproduced without having to send large volumes of audio around. If you are interested in creating your own noisy database from an existing SRE database (1 above), please look at the example script exampleQUTNOISESRE.sh in the QUT-NOISE/code directory. You will need to make some modifications, but it should give you the right idea. If you are interested in creating our QUT-NOISE-NIST2008 database published at Interspeech 2015, you can find the list of created noisy files in the QUT-NOISE-NIST2008.train.short2.list and QUT-NOISE-NIST2008.test.short3.list files in the QUT-NOISE/code directory. These files can be recreated as follows (provided you have access to the NIST2008 SRE data): Run matlab in the QUT-NOISE/code directory, and run the following functions: createQUTNOISESREfiles('NIST2008.train.short2.list', ... 'QUT-NOISE-NIST2008.train.short2.list', ... '
A
Data from: Global TIMIT Mandarin Chinese
abacus.library.ubc.ca
iso, txt
Updated Sep 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2021). Global TIMIT Mandarin Chinese [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=e08344fe5bd1a28453da6cb279da?persistentId=hdl%3A11272.1%2FAB2%2F2CCXH8&version=&q=&fileAccess=Restricted&fileTag=%22Documentation%22&fileSortField=type&fileSortOrder=
Explore at:
iso(461965312), txt(1308)Available download formats
Dataset updated
Sep 2, 2021
Dataset provided by
Abacus Data Network
Description
AbstractIntroductionGlobal TIMIT Mandarin Chinese was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately five hours of read speech and transcripts in Mandarin Chinese. The Global TIMIT project aimed to create a series of corpora in a variety of languages with a similar set of key features as in the original TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) which was designed for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. Specifically, these features included: A large number of fluently-read sentences, containing a representative sample of phonetic, lexical, syntactic, semantic, and pragmatic patterns A relatively large number of speakers Time-aligned lexical and phonetic transcription of all utterances Some sentences read by all speakers, others read by a few speakers, and others read by just one speaker DataGlobal TIMIT Mandarin Chinese consists of 50 speakers reading 120 sentences selected from Chinese Gigaword Fifth Edition (LDC2011T13). Among the 120 sentences, 20 sentences were read by all speakers, 40 sentences were read by 10 speakers, and 60 sentences were read by one speaker, for a total of 3220 sentence types. The corpus was recorded at Shanghai Jiao Tong University, China. Speakers (25 female, 25 male) were students at the university and all achieved Class 2 Level 1 or better on Putonghua Shuiping Ceshi (the national standard Mandarin proficiency test). All speech data are presented as 16kHz, 16-bit flac compressed wav files. Each file has accompanying phone and word segmentation files, as well as Praat TextGrid files.
t
TCD-Timit - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). TCD-Timit - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/tcd-timit
Explore at:
Dataset updated
Dec 16, 2024
Description
TCD-Timit is an audio-visual corpus of continuous speech.
n
Data from: DARPA-TIMIT
scidm.nchc.org.tw
Updated Oct 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). DARPA-TIMIT [Dataset]. https://scidm.nchc.org.tw/dataset/darpa-timit
Explore at:
Dataset updated
Oct 10, 2020
Description
The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found in the printed documentation which is also available from NTIS (NTIS# PB91-100354).
A
TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)
abacus.library.ubc.ca
iso, txt
Updated Jan 1, 1993
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (1993). TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version) [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=01803f5167b1f42b0a83934636be?persistentId=hdl%3A11272.1%2FAB2%2FBU0KGP&version=&q=&fileTypeGroupFacet=%22Archive%22&fileAccess=
Explore at:
iso(676833280), txt(1308)Available download formats
Dataset updated
Jan 1, 1993
Dataset provided by
Abacus Data Network
Area covered
United States
Description
This version of the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) has all the waveform files formatted with ms-wav / RIFF headers, to make the corpus more accessible to a wider audience. The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
TIMIT-corpus
kaggle.com
Updated Nov 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLTK Data (2017). TIMIT-corpus [Dataset]. https://www.kaggle.com/nltkdata/timitcorpus/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 16, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NLTK Data
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

The canonical metadata on NLTK:

<package id="timit" name="TIMIT Corpus Sample" sample="True" license="This corpus sample is Copyright 1993 Linguistic Data Consortium, and is distributed under the terms of the Creative Commons Attribution, Non-Commercial, ShareAlike license. http://creativecommons.org/" webpage="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1" unzip="1" />
VidTIMIT Audio-Video Dataset
zenodo.org
pdf, zip
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conrad Sanderson; Conrad Sanderson (2024). VidTIMIT Audio-Video Dataset [Dataset]. http://doi.org/10.5281/zenodo.158963
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.158963
Dataset updated
Aug 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Conrad Sanderson; Conrad Sanderson
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
See http://conradsanderson.id.au/vidtimit/ for details.

Summary: Video and corresponding audio recordings of 43 people, reciting short sentences. Useful for research on topics such as automatic lip reading, multi-view face recognition, multi-modal speech recognition and person identification.
h
TIMITPhones
huggingface.co
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iñigo Parra (2023). TIMITPhones [Dataset]. https://huggingface.co/datasets/IParraMartin/TIMITPhones
Explore at:
Dataset updated
Jun 19, 2023
Authors
Iñigo Parra
Description
TIMITPhones: TIMIT Phoneme Dataset

This corpus is a phoneme‑level derivative of the original TIMIT Acoustic‑Phonetic Continuous Speech Corpus.Each entry pairs a 1‑second waveform excerpt with a single phoneme label taken from the 61‑phone TIMIT inventory (the mapping to 39‑phone and broad‑class sets is also provided). This version is designed for quick prototyping of phoneme classifiers or probing acoustic representations.

Supported Tasks and Leaderboards

Automatic… See the full description on the dataset page: https://huggingface.co/datasets/IParraMartin/TIMITPhones.
h
timit
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quach Minh Tuan, timit [Dataset]. https://huggingface.co/datasets/nh0znoisung/timit
Explore at:
Authors
Quach Minh Tuan
Description
nh0znoisung/timit dataset hosted on Hugging Face and contributed by the HF Datasets community
Speech and Noise Corpora for Pitch Estimation of Human Speech
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bastian Bechtold; Bastian Bechtold (2020). Speech and Noise Corpora for Pitch Estimation of Human Speech [Dataset]. http://doi.org/10.5281/zenodo.3920591
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3920591
Dataset updated
Jun 30, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bastian Bechtold; Bastian Bechtold
Description
This dataset contains common speech and noise corpora for evaluating fundamental frequency estimation algorithms as convenient JBOF dataframes. Each corpus is available freely on its own, and allows redistribution:

CMU-ARCTIC (BSD license) [1]

FDA (free to download) [2]

KEELE (free for noncommercial use) [3]

MOCHA-TIMIT (free for noncommercial use) [4]

PTDB-TUG (ODBL license) [5]

NOISEX (free to download) [7]

QUT-NOISE (CC-BY-SA license) [8]

These files are published as part of my dissertation, "Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods", and in support of the Replication Dataset for Fundamental Frequency Estimation.

References:

John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.

Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.

F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.

Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.

Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.

Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.

David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.
MIT & TIMIT Preprocess for Speaker Recognition
kaggle.com
zip
Updated Nov 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommy NgX (2020). MIT & TIMIT Preprocess for Speaker Recognition [Dataset]. https://www.kaggle.com/tommyngx/pre-mit-timit
Explore at:
zip(493659227 bytes)Available download formats
Dataset updated
Nov 24, 2020
Authors
Tommy NgX
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Tommy NgX

Released under CC0: Public Domain

Contents
t
Deepfake-TIMIT - Dataset - LDM
service.tib.eu
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Deepfake-TIMIT - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/deepfake-timit
Explore at:
Dataset updated
Jan 2, 2025
Description
The Deepfake-TIMIT dataset contains 100,000 images of faces manipulated using Deepfakes.
DeepfakeTIMIT
zenodo.org
data.niaid.nih.gov
application/gzip, txt
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel (2021). DeepfakeTIMIT [Dataset]. http://doi.org/10.34777/s09v-3340
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.34777/s09v-3340
Dataset updated
Apr 21, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
DeepfakeTIMIT is a database of videos where faces are swapped using the open source GAN-based approach (adapted from here: https://github.com/shaoanlu/faceswap-GAN), which, in turn, was developed from the original autoencoder-based Deepfake algorithm.

When creating the database, we manually selected 16 similar looking pairs of people from publicly available VidTIMIT database. For each of 32 subjects, we trained two different models: a lower quality (LQ) with 64 x 64 input/output size model, and higher quality (HQ) with 128 x 128 size model (see the available images for the illustration). Since there are 10 videos per person in VidTIMIT database, we generated 320 videos corresponding to each version, resulting in 620 total videos with faces swapped. For the audio, we kept the original audio track of each video, i.e., no manipulation was done to the audio channel.

Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of DeepfakeTIMIT must cite the following paper:

P. Korshunov and S. Marcel,
DeepFakes: a New Threat to Face Recognition? Assessment and Detection.
arXiv and Idiap Research Report

Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of VidTIMIT and subsequently DeepfakeTIMIT must also cite the following paper:

C. Sanderson and B.C. Lovell,
Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference.
Lecture Notes in Computer Science (LNCS), Vol. 5558, pp. 199-208, 2009.
A
Data from: Audiovisual Database of Spoken American English
abacus.library.ubc.ca
iso, txt
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abacus Data Network (2022). Audiovisual Database of Spoken American English [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=c5e4300541b7b5eb929e2172a93b?persistentId=hdl%3A11272.1%2FAB2%2F8KIBXB&version=&q=&fileAccess=Restricted&fileTag=%22Documentation%22&fileSortField=&fileSortOrder=
Explore at:
txt(1308), iso(7945883648)Available download formats
Dataset updated
Mar 18, 2022
Dataset provided by
Abacus Data Network
Area covered
United States
Description
AbstractIntroduction The Audiovisual Database of Spoken American English, Linguistic Data Consortium (LDC) catalog number LDC2009V01 and isbn 1-58563-496-4, was developed at Butler University, Indianapolis, IN in 2007 for use by a a variety of researchers to evaluate speech production and speech recognition. It contains approximately seven hours of audiovisual recordings of fourteen American English speakers producing syllables, word lists and sentences used in both academic and clinical settings. All talkers were from the North Midland dialect region -- roughly defined as Indianapolis and north within the state of Indiana -- and had lived in that region for the majority of the time from birth to 18 years of age. Each participant read 238 different words and 166 different sentences. The sentences spoken were drawn from the following sources: Central Institute for the Deaf (CID) Everyday Sentences (Lists A-J) Northwestern University Auditory Test No. 6 (Lists I-IV) Vowels in /hVd/ context (separate words) Texas Instruments/Massachusetts Institute for Technology (TIMIT) sentences The CID Everyday Sentences were created in the 1950s from a sample developed by the Armed Forces National Research Committee on Hearing and Bio-Acoustics. They are considered to represent everyday American speech and have the following characteristics: the vocabulary is appropriate to adults; the words appear with high frequency in one or more of the well-known word counts of the English language; proper names and proper nouns are not used; common non-slang idioms and contractions are used freely; phonetic loading and "tongue-twisting" are avoided; redundancy is high; the level of abstraction is low; and grammatical structure varies freely. Northwestern University Auditory Test No. 6 is a phonemically-balanced set of monosyllabic English words used clinically to test speech perception in adults with hearing loss. The /hVd/ vowel list was created to elicit all of the vowel sounds of American English. The TIMIT sentences are a subset (34 sentences) of the 2342 phonetically-rich sentences read by speakers in the TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. TIMIT was designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT speakers were from eight dialect regions of the United States. The Audiovisual Database of Spoken American English will be of interest in various disciplines: to linguists for studies of phonetics, phonology, and prosody of American English; to speech scientists for investigations of motor speech production and auditory-visual speech perception; to engineers and computer scientists for investigations of machine audio-visual speech recognition (AVSR); and to speech and hearing scientists for clinical purposes, such as the examination and improvement of speech perception by listeners with hearing loss. Data Participants were recorded individually during a single session. A participant first completed a statement of informed consent and a questionnaire to gather biographical data and then was asked by the experimenter to mark his or her Indiana hometown on a state map. The experimenter and participant then moved to a small, sound-treated studio where the participant was seated in front of three navy blue baffles. A laptop computer was elevated to eye-level on a speaker stand and placed approximately 50-60 cm in front of the participant. Prompts were presented to the participant in a Microsoft PowerPoint presentation. The experimenter was seated directly next to the participant, but outside the camera angle, and advanced the PowerPoint slides at a comfortable pace. Participants were recorded with a Panasonic DVC-80 digital video camera to miniDV digital video cassette tapes. All participants wore a Sennheiser MKE-2060 directional/cardioid lapel microphone throughout the recordings. Each speaker produced a total of 94 segmented files which were converted from Final Cut Express to Quicktime (.mov) files and then saved in the appropriately marked folder. If a speaker mispronounced a sentence or word during the recording process, the mispronunciations were edited out of the segments to be archived. The remaining parts of the recording, including the correct repetition of each prompt, were then sequenced together to create a continuous and complete segment. The fourteen participants were between 19 and 61 years of age (with a mean age of 30 years) and native speakers of American English.

Facebook

Twitter

Click to copy link

Link copied

Cite

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, Victor Zue (2022). TIMIT [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/timit-dataset/

TIMIT

Explore at:

deeplakeAvailable download formats

Dataset updated

Mar 24, 2022

Authors

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, Victor Zue

License

Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically

Description

The TIMIT dataset is a corpus of read speech. It consists of recordings of 630 speakers, each reading 10 phonetically-balanced sentences. The dataset is divided into a training set of 462 speakers and a test set of 168 speakers.

Clear search

Close search

Google apps

Main menu

TIMIT

DARPA TIMIT Acoustic-Phonetic Continuous Speech

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Corpus Speaker Distribution

Corpus Text Material

Core Test Set:

Complete Test Set:

timit

TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection

Data from: TIMIT Acoustic-Phonetic Continuous Speech Corpus

STC-TIMIT 1.0

The QUT-NOISE Databases and Protocols

Data from: Global TIMIT Mandarin Chinese

TCD-Timit - Dataset - LDM

Data from: DARPA-TIMIT

TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)

TIMIT-corpus

Context

VidTIMIT Audio-Video Dataset

TIMITPhones

timit

Speech and Noise Corpora for Pitch Estimation of Human Speech

MIT & TIMIT Preprocess for Speaker Recognition

Dataset

Contents

Deepfake-TIMIT - Dataset - LDM

DeepfakeTIMIT

Data from: Audiovisual Database of Spoken American English

TIMIT