100+ datasets found
  1. E

    Data from: SNABI database for continuous speech recognition 1.2

    • live.european-language-grid.eu
    binary format
    Updated Mar 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2002). SNABI database for continuous speech recognition 1.2 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/20237
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Mar 1, 2002
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The SNABI speech database can be used to train continuous speech recognition for Slovene language. The database comprises 1530 sentences, 150 words and the alphabet. 132 speakers were recorded, each reading 200 sentences or more. This resulted in more than 15,000 recordings of speech signal contained in the database. The recordings were done in studio (SNABI SI_SSQ) and through a telephone line (SNABI SI_SFN).

  2. Speaker Recognition - CMU ARCTIC

    • kaggle.com
    zip
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Lins (2022). Speaker Recognition - CMU ARCTIC [Dataset]. https://www.kaggle.com/datasets/mrgabrielblins/speaker-recognition-cmu-arctic
    Explore at:
    zip(1354293783 bytes)Available download formats
    Dataset updated
    Nov 21, 2022
    Authors
    Gabriel Lins
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Can you predict which speaker is talking?
    • Can you predict what they are saying? This dataset makes all of these possible. Perfect for a school project, research project, or resume builder.

    File information

    • train.csv - file containing all the data you need for training, with 4 columns, id (file id), file_path(path to .wav files), speech(transcription of audio file), and speaker (target column)
    • test.csv - file containing all the data you need to test your model (20% of total audio files), it has the same columns as train.csv
    • train/ - Folder with training data, subdivided with Speaker's folders
      • aew/ - Folder containing audio files in .wav format for speaker aew
      • ...
    • test/ - Folder containing audio files for test data.

    Column description

    ColumnDescription
    idfile id (string)
    file_pathfile path to .wav file (string)
    speechtranscription of the audio file (string)
    speakerspeaker name, use this as the target variable if you are doing audio classification (string)

    More Details

    The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US-English single-speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

    The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databases include US English male (bdl) and female (slt) speakers (both experienced voice talent) as well as other accented speakers.

    The 1132 sentence prompt list is available from cmuarctic.data

    The distributions include 16KHz waveform and simultaneous EGG signals. Full phonetically labeling was performed by the CMU Sphinx using the FestVox based labeling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labeling, etc.

    Acknowledgements

    This work was partially supported by the U.S. National Science Foundation under Grant No. 0219687, "ITR/CIS Evaluation and Personalization of Synthetic Voices". Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  3. AccentDB - Core & Extended

    • kaggle.com
    zip
    Updated Feb 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sparsh Gupta (2021). AccentDB - Core & Extended [Dataset]. https://www.kaggle.com/imsparsh/accentdb-core-extended
    Explore at:
    zip(6738893005 bytes)Available download formats
    Dataset updated
    Feb 17, 2021
    Authors
    Sparsh Gupta
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A Database of Non-Native English Accents to Assist Neural Speech Recognition

    Context

    AccentDB is a multi-pairwise parallel corpus of structured and labelled accented speech. It contains speech samples from speakers of 4 non-native accents of English (8 speakers, 4 Indian languages); and also has a compilation of 4 native accents of English (4 countries, 13 speakers) and a metropolitan Indian accent (2 speakers). The dataset available here corresponds to release titled accentdb_extended on

  4. E

    AURORA-5

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). AURORA-5 [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-AURORA-CD0005/
    Explore at:
    Dataset updated
    Aug 16, 2017
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Aurora project was originally set up to establish a worldwide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system.The earlier three Aurora experiments had a focus on additive noise and the influence of some telephone frequency characteristics. Aurora-5 tries to cover all effects as they occur in realistic application scenarios. The focus was put on two scenarios. The first one is the hands-free speech input in the noisy car environment with the intention of controlling either devices in the car itself or retrieving information from a remote speech server over the telephone. The second one covers the hands-free speech input in a type of office or in a type of living room to control e.g. a telephone device or some audio/video equipment.The AURORA-5 database contains the following data:•Artificially distorted versions of the recordings from adult speakers in the TI-Digits speech database downsampled at a sampling frequency of 8000 Hz. The distortions consist of: - additive background noise, - the simulation of a hands-free speech input in rooms, - the simulation of transmitting speech over cellular telephone networks.•A subset of recordings from the meeting recorder project at the International Computer Science Institute. The recordings contain sequences of digits uttered by different speakers in hands-free mode in a meeting room.•A set of scripts for running recognition experiments on the above mentioned speech data. The experiments are based on the usage of the freely available software package HTK where HTK is not part of this resource.Further information is also available at the following address: http://aurora.hsnr.de

  5. E

    ASR database ARTUR 0.1 (audio)

    • live.european-language-grid.eu
    binary format
    Updated Nov 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ASR database ARTUR 0.1 (audio) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/20861
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 30, 2022
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840 hours are transcribed, while the remaining 195 hours are without transcription. The data is divided into 4 parts: (1) approx. 520 hours of read speech, which includes the reading of pre-defined sentences, selected from the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320); each sentence is contained in one file; speakers are demographically balanced; spelling is included in special files; all with manual transcriptions; (2) approx. 204 hours of public speech, which includes media recordings, online recordings of conferences, workshops, education videos, etc.; 56 hours are manually transcribed; (3) approx. 110 hours of private speech, which includes monologues and dialogues between two persons, recorded for the purposes of the speech database; the speakers are demographically balanced; two subsets for domain-specific ASR (i.e., smart-home and face-description) are included; 63 hours are manually transcribed; (4) approx. 201 hours of parliamentary speech, which includes recordings from the Slovene National Assembly, all with manual transcriptions. Audio files are WAV 44,1 kHz, pcm, 16-bit, mono.

    This entry includes the recordings only; transcriptions are available at http://hdl.handle.net/11356/1718.

  6. E

    AURORA Project database - Subset of SpeechDat-Car - Spanish database -...

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 16, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). AURORA Project database - Subset of SpeechDat-Car - Spanish database - Evaluation Package [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-AURORA-CD0003_02/
    Explore at:
    Dataset updated
    Aug 16, 2017
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :- ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm- ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.This database is a subset of the SpeechDat-Car database in Spanish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Spanish digits spoken in the following noise and driving conditions inside a car : 1. Quiet environment. Stop motor running. 2. Low noise. Town traffic + low speed rough road. 3. High noise : High speed good road.

  7. Data from: Written and spoken digits database for multimodal learning

    • zenodo.org
    bin
    Updated Jan 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.3515935
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database description:

    The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion.

    The written digits database is the original MNIST handwritten digits database [1] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

    The spoken digits database was extracted from Google Speech Commands [2], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

    To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [1] and [2] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

    The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

    Files:

    • data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;
    • data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;
    • labels_train.npy: 60000 labels for the training subset;
    • data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;
    • data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;
    • labels_test.npy: 10000 labels for the test subset.

    References:

    1. LeCun, Y. & Cortes, C. (1998), “MNIST handwritten digit database”.
    2. Warden, P. (2018), “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”.
  8. Z

    Speech and Noise Corpora for Pitch Estimation of Human Speech

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastian Bechtold (2020). Speech and Noise Corpora for Pitch Estimation of Human Speech [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3920590
    Explore at:
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Jade Hochschule
    Authors
    Bastian Bechtold
    Description

    This dataset contains common speech and noise corpora for evaluating fundamental frequency estimation algorithms as convenient JBOF dataframes. Each corpus is available freely on its own, and allows redistribution:

    CMU-ARCTIC (BSD license) [1]

    FDA (free to download) [2]

    KEELE (free for noncommercial use) [3]

    MOCHA-TIMIT (free for noncommercial use) [4]

    PTDB-TUG (ODBL license) [5]

    NOISEX (free to download) [7]

    QUT-NOISE (CC-BY-SA license) [8]

    These files are published as part of my dissertation, "Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods", and in support of the Replication Dataset for Fundamental Frequency Estimation.

    References:

    John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.

    Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.

    F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.

    Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.

    Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.

    John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.

    Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.

    David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.

  9. h

    NST

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasjonalbiblioteket AI Lab, NST [Dataset]. https://huggingface.co/datasets/NbAiLab/NST
    Explore at:
    Dataset authored and provided by
    Nasjonalbiblioteket AI Lab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Norwegian. In this version, the organization of the data have been altered to improve the usefulness of the database.

    The acoustic databases described below were developed by the firm Nordisk språkteknologi holding AS (NST), which went bankrupt in 2003. In 2006, a consortium consisting of the University of Oslo, the University of Bergen, the Norwegian University of Science and Technology, the Norwegian Language Council and IBM bought the bankruptcy estate of NST, in order to ensure that the language resources developed by NST were preserved. In 2009, the Norwegian Ministry of Culture charged the National Library of Norway with the task of creating a Norwegian language bank, which they initiated in 2010. The resources from NST were transferred to the National Library in May 2011, and are now made available in Språkbanken, for the time being without any further modification. Språkbanken is open for feedback from users about how the resources can be improved, and we are also interested in improved versions of the databases that users wish to share with other users. Please send response and feedback to sprakbanken@nb.no.

  10. i

    Data from: The TORGO database of acoustic and articulatory speech from...

    • incluset.com
    Updated 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Rudzicz; Aravind Kumar Namasivayam; Talya Wolff (2011). The TORGO database of acoustic and articulatory speech from speakers with dysarthria [Dataset]. https://incluset.com/datasets
    Explore at:
    Dataset updated
    2011
    Authors
    Frank Rudzicz; Aravind Kumar Namasivayam; Talya Wolff
    Measurement technique
    It consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS).
    Description

    This database is orignally created for a resource for developing advanced models in automatic speech recognition that are more suited to the needs of people with dysarthria.

  11. EmoDB Dataset

    • kaggle.com
    zip
    Updated Sep 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Agnihotri (2020). EmoDB Dataset [Dataset]. https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb/code
    Explore at:
    zip(39877079 bytes)Available download formats
    Dataset updated
    Sep 24, 2020
    Authors
    Piyush Agnihotri
    Description

    Emo-DB Database

    The EMODB database is the freely available German emotional database. The database is created by the Institute of Communication Science, Technical University, Berlin, Germany. Ten professional speakers (five males and five females) participated in data recording. The database contains a total of 535 utterances. The EMODB database comprises of seven emotions: 1) anger; 2) boredom; 3) anxiety; 4) happiness; 5) sadness; 6) disgust; and 7) neutral. The data was recorded at a 48-kHz sampling rate and then down-sampled to 16-kHz.

    Additional Information

    Every utterance is named according to the same scheme:

    • Positions 1-2: number of speaker
    • Positions 3-5: code for text
    • Position 6: emotion (sorry, letter stands for german emotion word)
    • Position 7: if there are more than two versions these are numbered a, b, c ....

    Example: 03a01Fa.wav is the audio file from Speaker 03 speaking text a01 with the emotion "Freude" (Happiness).

    Information about the speakers

    • 03 - male, 31 years old
    • 08 - female, 34 years
    • 09 - female, 21 years
    • 10 - male, 32 years
    • 11 - male, 26 years
    • 12 - male, 30 years
    • 13 - female, 32 years
    • 14 - female, 35 years
    • 15 - male, 25 years
    • 16 - female, 31 years


    Code of emotions:

    letteremotion (english)letteremotion (german)
    AangerWÄrger (Wut)
    BboredomLLangeweile
    DdisgustEEkel
    Fanxiety/fearAAngst
    HhappinessFFreude
    SsadnessTTrauer
    N = neutral version

    Inspiration

    EMOTION classification from speech has an increasing interest in the field of the speech processing area. The objective of the emotion classification is to classify different emotions from the speech signal. A person’s emotional state affects the production mechanism of speech, and due to this, breathing rate and muscle tension change from the neutral condition. Therefore, the resulting speech signal may have different characteristics from that of neutral speech.

    The performance of speech recognition or speaker recognition decreases significantly if the model is trained with neutral speech and it is tested with an emotional speech. So we as a Machine Learning Enthusiast can start working on speaker emotion recognition problems and can come up with some good robust models.

  12. m

    Conversational Skills in Language Learning Games: A Speech Recognition...

    • data.mendeley.com
    • figshare.com
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Kuvvetli (2023). Conversational Skills in Language Learning Games: A Speech Recognition Technology Dataset [Dataset]. http://doi.org/10.17632/bhvd9z5jjr.1
    Explore at:
    Dataset updated
    Dec 8, 2023
    Authors
    Murat Kuvvetli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The "SpeechRec_LanguageLearning_ConversationalSkills" dataset is a collection of data generated in a game-based language learning environment, aiming to explore the impact of Speech Recognition Technology (SRT) on the development of conversational skills. The dataset encompasses speaking test results conducted within the context of language learning games utilizing SRT.

  13. Data from: Speech databases of typical children and children with SLI

    • figshare.com
    • live.european-language-grid.eu
    • +1more
    zip
    Updated Dec 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Grill (2016). Speech databases of typical children and children with SLI [Dataset]. http://doi.org/10.6084/m9.figshare.2360626.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Pavel Grill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our Laboratory of Artificial Neural Network Applications (LANNA) in the Czech Technical University in Prague (head of the laboratory is professor Jana Tučková) collaborates on a project with the Department of Paediatric Neurology, 2nd Faculty of Medicine of Charles University in Prague and with the Motol University Hospital (head of clinic is professor Vladimír Komárek), which focuses on the study of children with SLI.The speech database contains two subgroups of recordings of children's speech from different types of speakers. The first subgroup (healthy) consists of recordings of children without speech disorders; the second subgroup (patients) consists of recordings of children with SLI. These children have different degrees of severity (1 – mild, 2 – moderate, and 3 – severe). The speech therapists and specialists from Motol Hospital decided upon this classification. The children’s speech was recorded in the period 2003-2013. These databases were commonly created in a schoolroom or a speech therapist’s consulting room, in the presence of surrounding background noise. This situation simulates the natural environment in which the children live, and is important for capturing the normal behavior of children. The database of healthy children’s speech was created as a referential database for the computer processing of children’s speech. It was recorded on the SONY digital Dictaphone (sampling frequency, fs = 16 kHz, 16-bit resolution in stereo mode in the standardized wav format) and on the MD SONY MZ-N710 (sampling frequency, fs = 44.1 kHz, 16-bit resolution in stereo mode in the standardized wav format). The corpus was recorded in the natural environment of a schoolroom and in a clinic. This subgroup contains a total of 44 native Czech participants (15 boys, 29 girls) aged 4 to 12 years, and was recorded during the period 2003–2005. The database of children with SLI was recorded in a private speech therapist’s office. The children’s speech is captured by means of a SHURE lapel microphone using the solution by the company AVID (MBox – USB AD/DA converter and ProTools LE software) on an Apple laptop (iBook G4). The sound recordings are saved in the standardized wav format. The sampling frequency is set to 44.1 kHz with 16-bit resolution in mono mode. This subgroup contains a total of 54 native Czech participants (35 boys, 19 girls) aged 6 to 12 years, and was recorded during the period 2009–2013. This package contains wav data sets for development and testing methods for detection children with SLI.Software pack:FORANA - was developed the original software FORANA for formants analysis. It is based on the MATLAB programming environment. The development of this software was mainly driven by the need to have the ability to complete formant analysis correctly and full automation of the process of extracting formants from the recorded speech signals. Development of this application is still running. Software was developed in the LANNA at CTU FEE in Prague.LABELING - the program LABELING is used for segmentation of the speech signal. It is a part of SOMLab program system. Software was developed in the LANNA at CTU FEE in Prague.PRAAT - is an acoustic analysis software. The Praat program was created by Paul Boersma and David Weenink of the Institute of Phonetics Sciences of the University of Amsterdam. Home page:http://www.praat.org or http://www.fon.hum.uva.nl/praat/.openSMILE - The openSMILE feature extration tool enables you to extract large audio feature spaces in realtime. It combines features from Music Information Retrieval and Speech Processing. SMILE is an acronym forSpeech & Music Interpretation by Large-space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library. The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy binary plugin interface and a comprehensive API. Citing: Florian Eyben, Martin Wöllmer, Björn Schuller: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", In Proc. ACM Multimedia (MM), ACM, Florence, Italy, ACM, ISBN 978-1-60558-933-6, pp. 1459-1462, October 2010. doi:10.1145/1873951.1874246

  14. i

    Data from: Dysarthric speech database for universal access research

    • incluset.com
    Updated 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heejin Kim; Mark Allan Hasegawa-Johnson; Adrienne Perlman; Jon Gunderson; Thomas S Huang; Kenneth Watkin; Simone Frame (2007). Dysarthric speech database for universal access research [Dataset]. https://incluset.com/datasets
    Explore at:
    Dataset updated
    2007
    Authors
    Heejin Kim; Mark Allan Hasegawa-Johnson; Adrienne Perlman; Jon Gunderson; Thomas S Huang; Kenneth Watkin; Simone Frame
    Measurement technique
    Participants read a variety of words, like digits, letters, computer commands, common words, and uncommon words from porject Gutenberg novels.
    Description

    This dataset is collected to enhance research into speech recognition systems for dysarthic speech.

  15. i

    Data from: The Whitaker database of dysarthric (cerebral palsy) speech

    • incluset.com
    Updated 1992
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. R. Deller; M. S. Liu; L. J. Ferrier; P. Robichaud (1992). The Whitaker database of dysarthric (cerebral palsy) speech [Dataset]. https://incluset.com/datasets
    Explore at:
    Dataset updated
    1992
    Authors
    J. R. Deller; M. S. Liu; L. J. Ferrier; P. Robichaud
    Measurement technique
    Participants read the TI-46 word list and Grandfather word list. It is created in an effort to provide a standard collection of speech which may be used internationally for the development of recognition technology and in studies of speech perception or articulation.
    Description

    This dataset is collected to enhance research into speech recognition systems for dysarthic speech. It consists of 19275 isolated‐word utterances of speakers with dysarthria due to cerebral palsy.

  16. e

    NST Norwegian ASR Database (16 kHz)

    • data.europa.eu
    • data.norge.no
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NST Norwegian ASR Database (16 kHz) [Dataset]. https://data.europa.eu/data/datasets/sbr-13?locale=en
    Explore at:
    application/x-gtar , application/pdfAvailable download formats
    License

    https://hdl.handle.net/21.11146/13/.well-known/skolem/bead126e-5168-3e2b-8959-07eaf5d458d5https://hdl.handle.net/21.11146/13/.well-known/skolem/bead126e-5168-3e2b-8959-07eaf5d458d5

    Description

    This database was originally developed by Nordic Language Technology in the 1990ies in order to facilitate automatic speech recognition (ASR) in Norwegian . A reorganized and more user friendly version of this database is also available from The Language Bank. Type "sbr-54" in the search bar to find the updated version.

  17. r

    The QUT-NOISE Databases and Protocols

    • researchdata.edu.au
    Updated 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dean David; Sridharan Sridha; Vogt Robert; Mason Michael (2010). The QUT-NOISE Databases and Protocols [Dataset]. http://doi.org/10.4225/09/58819f7a21a21
    Explore at:
    Dataset updated
    2010
    Dataset provided by
    Queensland University of Technology
    Authors
    Dean David; Sridharan Sridha; Vogt Robert; Mason Michael
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The QUT-NOISE Databases and Protocols

    Overview

    This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. It also contains code to create the QUT-NOISE-SRE protocol on top of an existing speaker recognition evaluation database (such as NIST evaluations). Further information on the QUT-NOISE and QUT-NOISE-TIMIT databases is available in our paper:


    D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) , in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

    This paper is also available in the file: docs/Dean2010, The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithm.pdf, distributed with this database.

    Further information on the QUT-NOISE-SRE protocol is available in our paper:
    D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) . In Proceedings of Interspeech 2015, September, Dresden, Germany.

    Licensing

    The QUT-NOISE data itself is licensed CC-BY-SA, and the code required to create the QUT-NOISE-TIMIT database and QUT-NOISE-SRE protocols is licensed under the BSD license. Please consult the approriate LICENSE.txt files (in the code and QUT-NOISE directories) for more information. To attribute this database, please include the following citation:


    D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) , in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

    If your work is based upon the QUT-NOISE-SRE, please also include this citation:
    D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) . In Proceedings of Interspeech 2015, September, Dresden, Germany.

    Download and Installation

    Download the following QUT-NOISE*.zip files:

    • (26.7 MB, md5sum: 672461fd88782e9ea10d5c2cb7a84196)
    • (1.6 GB, md5sum: f87fb213c0e1c439e1b727fb258ef2cd)
    • (1.7 GB, md5sum: d680118b4517e1257a9263b99d1ac401)
    • (1.4 GB, md5sum: d99572ae1c118b749c1ffdb2e0cf0d2e)
    • (1.4 GB, md5sum: fe107ab341e6bc75de3a32c69344190e)
    • (1.6 GB, md5sum: 68d5ebc2e60cb07927cc4d33cdf2f017)

    Creating QUT-NOISE-TIMIT

    Obtaining TIMIT

    In order to construct the QUT-NOISE-TIMIT database from the QUT-NOISE data supplied here you will need to obtain a copy of the TIMIT database from the . If you just want to use the QUT-NOISE database, or you wish to combine it with different speech data, TIMIT is not required.

    Creating QUT-NOISE-TIMIT

    • Once you have obtained TIMIT, download and install a copy of and install it in your MATLABPATH.
    • Run matlab in the QUT-NOISE/code directory, and run the function: createQUTNOISETIMIT('/location/of/timit-cd/timit'). This will create the QUT-NOISE-TIMIT database in the QUT-NOISE/QUT-NOISE-TIMIT directory.
    • If you wish to verify that the QUT-NOISE-TIMIT database matches that evaluated in our original paper, please check that the md5sums (use md5sum on unix-based OSes) match those in the QUT-NOISE-TIMIT/md5sum.txt file.
    • Using the QUT-NOISE-SRE protocol
      • The code related to the QUT-NOISE-SRE protocol can be used in two ways:
        1. To create a collection of noisy audio files across the scenarios in the QUT-NOISE database at different noise levels, or,
        2. To recreate a list of file names based on the QUT-NOISE-SRE protocl produced by another researcher, having already done (1). This allows existing research to be reproduced without having to send large volumes of audio around.
      • If you are interested in creating your own noisy database from an existing SRE database (1 above), please look at the example script exampleQUTNOISESRE.sh in the QUT-NOISE/code directory. You will need to make some modifications, but it should give you the right idea.
      • If you are interested in creating our QUT-NOISE-NIST2008 database published at Interspeech 2015, you can find the list of created noisy files in the QUT-NOISE-NIST2008.train.short2.list and QUT-NOISE-NIST2008.test.short3.list files in the QUT-NOISE/code directory.
      • These files can be recreated as follows (provided you have access to the NIST2008 SRE data):

        Run matlab in the QUT-NOISE/code directory, and run the following functions:

        createQUTNOISESREfiles('NIST2008.train.short2.list', ...
        'QUT-NOISE-NIST2008.train.short2.list', ...
        '
  18. Data from: DiPCo -- Dinner Party Corpus

    • zenodo.org
    application/gzip, pdf
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maarten Van Segbroeck; Ahmed Zaid; Ksenia Kutsenko; Cirenia Huerta; Tinh Nguyen; Xuewen Luo; Björn Hoffmeister; Jan Trmal; Maurizio Omologo; Roland Maas; Maarten Van Segbroeck; Ahmed Zaid; Ksenia Kutsenko; Cirenia Huerta; Tinh Nguyen; Xuewen Luo; Björn Hoffmeister; Jan Trmal; Maurizio Omologo; Roland Maas (2024). DiPCo -- Dinner Party Corpus [Dataset]. http://doi.org/10.21437/interspeech.2020-2800
    Explore at:
    application/gzip, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maarten Van Segbroeck; Ahmed Zaid; Ksenia Kutsenko; Cirenia Huerta; Tinh Nguyen; Xuewen Luo; Björn Hoffmeister; Jan Trmal; Maurizio Omologo; Roland Maas; Maarten Van Segbroeck; Ahmed Zaid; Ksenia Kutsenko; Cirenia Huerta; Tinh Nguyen; Xuewen Luo; Björn Hoffmeister; Jan Trmal; Maurizio Omologo; Roland Maas
    License

    https://cdla.io/permissive-1-0https://cdla.io/permissive-1-0

    Description

    We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices positioned at different locations in the recording room. The dataset contains the audio recordings and human labeled transcripts of a total of 10 sessions with a duration between 15 and 45 minutes. The corpus was created to advance in the field of noise robust and distant speech processing and is intended to serve as a public research and benchmarking data set.

  19. h

    nst

    • huggingface.co
    Updated Feb 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KTH (2024). nst [Dataset]. https://huggingface.co/datasets/KTH/nst
    Explore at:
    Dataset updated
    Feb 1, 2024
    Dataset authored and provided by
    KTH
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Swedish. In this updated version, the organization of the data have been altered to improve the usefulness of the database.

    In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data. The files have been renamed, such that the file names are unique and meaningful regardless of the folder structure. The original metadata files were in spl format. These have been converted to JSON format. The converted metadata files are also anonymized and the text encoding has been converted from ANSI to UTF-8.

  20. g

    Indonesian Media Audio Database

    • gts.ai
    json
    Updated Jan 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Indonesian Media Audio Database [Dataset]. https://gts.ai/case-study/indonesian-media-audio-database-custom-ai-data-collection/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 31, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Our project, “Indonesian Media Audio Database,” is designed to establish a rich and diverse dataset tailored for training advanced machine learning models in language processing, speech recognition, and cultural analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2002). SNABI database for continuous speech recognition 1.2 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/20237

Data from: SNABI database for continuous speech recognition 1.2

Related Article
Explore at:
binary formatAvailable download formats
Dataset updated
Mar 1, 2002
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The SNABI speech database can be used to train continuous speech recognition for Slovene language. The database comprises 1530 sentences, 150 words and the alphabet. 132 speakers were recorded, each reading 200 sentences or more. This resulted in more than 15,000 recordings of speech signal contained in the database. The recordings were done in studio (SNABI SI_SSQ) and through a telephone line (SNABI SI_SFN).

Search
Clear search
Close search
Google apps
Main menu