100+ datasets found
  1. t

    Data from: TSP speech database

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TSP speech database [Dataset]. https://service.tib.eu/ldmservice/dataset/tsp-speech-database
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The TSP speech database is a dataset of speech recordings.

  2. E

    Serbian emotional speech database (GEES)

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    Updated Apr 9, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Serbian emotional speech database (GEES) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1480
    Explore at:
    audio formatAvailable download formats
    Dataset updated
    Apr 9, 2016
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The database contains recordings from six actors, three of each gender. The following emotions have been recorded: neutral, anger, happiness, sadness and fear. The database consists of: 32 isolated words, 30 short semantically neutral sentences, 30 long semantically neutral sentences and one passage with 79 words in size. The overall size of database is 2790 recordings or approximately 3 hours of speech. Statistical evaluation of database shows full phonetic balance according to the phonetic statistics of Serbian language and the statistics of other speech segments (syllables, consonant sets, accents) are in agreement with overall statistics of Serbian language. GEES database was recorded in an anechoic studio at 22,050 Hz sampling frequency.

  3. E

    Russian Speech Database

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Russian Speech Database [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1512
    Explore at:
    audio formatAvailable download formats
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The STC Russian speech database was recorded in 1996-1998. The main purpose of the database is to investigate individual speaker variability and to validate speaker recognition algorithms. The database was recorded through a 16-bit Vibra-16 Creative Labs sound card with an 11,025 Hz sampling rate. The database contains Russian read speech of 89 different speakers (54 male, 35 female), including 70 speakers with 15 sessions or more, 10 speakers with 10 sessions or more and 9 speakers with less than 10 sessions. The speakers were recorded in Saint-Petersburg and are within the age of 18-62. All are native speakers. The corpus consists of 5 sentences. Each speaker reads carefully but fluently each sentence 15 times on different dates over the period of 1-3 months. The corpus contains a total of 6,889 utterances and of 2 volumes, total size 700 MB uncompressed data. The signal of each utterance is stored as a separate file (approx. 126 KB). Total size of data for one speaker approximates 9,500 KB. Average utterance duration is about 5 sec. A file gives information about the speakers (speaker?s age and gender). The orthography and phonetic transcription of the corpus is given in separate files which contain the prompted sentences and their transcription in IPA. The signal files are raw files without any header, 16 bit per sample, linear, 11,025 Hz sample frequency. The recording conditions were as follows: Microphone: dynamic omnidirectional high-quality microphone, distance to mouth 5-10 cm Environment: office room Sampling rate: 11,025 Hz Resolution: 16 Bit Sound board: Creative Labs Vibra-16 Means of delivery: CD-ROM

  4. E

    Farsdat (Farsi Speech Database)

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Mar 7, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2016). Farsdat (Farsi Speech Database) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0112/
    Explore at:
    Dataset updated
    Mar 7, 2016
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Persian Speech Database Farsdat comprises the recordings of 300 Iranian speakers, who differ from each other with regards to age, sex, education level, and dialect (10 dialect regions of Iran were represented: Tehrani, Torki, Esfahani, Jonubi, Shomali, Khorassani, Baluchi, Kordi, Lori, and Yazdi). Each speaker uttered 20 sentences in two sessions, and 100 of these speakers uttered 110 isolated words. 6000 utterances were segmented and labelled phonetically and phonemically manually, including 386 phonetically balanced sentences, using IPA characters. The acoustic signal has been stored with a Wave file standard, so that it can be used by any other application software. The used sampling frequency reaches 22.5 KHz, and the signal-to-noise ratio 34 dB. The ambiguities in segmentation have been solved by reference to the corresponding spectrograms extracted from DSP sona-Graph KAY 5500.

  5. Dysarthria Detection

    • kaggle.com
    zip
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal Hungund (2022). Dysarthria Detection [Dataset]. https://www.kaggle.com/datasets/iamhungundji/dysarthria-detection
    Explore at:
    zip(162985010 bytes)Available download formats
    Dataset updated
    Aug 8, 2022
    Authors
    Bilal Hungund
    Description

    The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls. This database, called TORGO, is the result of a collaboration between Computer Science and Speech-Language Pathology departments at the University of Toronto and the Holland-Bloorview Kids Rehab hospital in Toronto.

    This dataset contains 2000 samples for dysarthric males, dysarthric females, non-dysarthric males, and non-dysarthric females.

    Originally TORGO database contains 18GB of data, to download and for more information on data, please refer to the following link, http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

    This database should be used only for academic purposes.

    Database / Licence Reference: Rudzicz, F., Namasivayam, A.K., Wolff, T. (2012) The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), pages 523--541.

    Data Information:

    It contains four folders with descriptions below, - dysarthria_female: 500 samples of dysarthric female audio recorded on different sessions. - dysarthria_male: 500 samples of dysarthric male audio recorded on different sessions. - non _dysarthria _female: 500 samples of non-dysarthric female audio recorded on different sessions. - non _dysarthria _male: 500 samples of non-dysarthric male audio recorded on different sessions.

    data.csv filename: audio file path is_dysarthria: non-dysarthria or dysarthria gender: male or female

    Application of the data, - Applying deep learning technology to classify dysarthria and non-dysarthria patients

    References: Dumane, P., Hungund, B., Chavan, S. (2021). Dysarthria Detection Using Convolutional Neural Network. In: Pawar, P.M., Balasubramaniam, R., Ronge, B.P., Salunkhe, S.B., Vibhute, A.S., Melinamath, B. (eds) Techno-Societal 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-69921-5_45

  6. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    šŸ’µ Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  7. E

    EWA-DB – Early Warning of Alzheimer speech database

    • catalogue.elra.info
    • data.niaid.nih.gov
    • +1more
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2023). EWA-DB – Early Warning of Alzheimer speech database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0489/
    Explore at:
    Dataset updated
    Oct 4, 2023
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    EWA-DB is a speech database that contains data from 3 clinical groups: Alzheimer's disease, Parkinson's disease, mild cognitive impairment, and a control group of healthy subjects. Speech samples of each clinical group were obtained using the EWA smartphone application, which contains 4 different language tasks: sustained vowel phonation, diadochokinesis, object and action naming (30 objects and 30 actions), picture description (two single pictures and three complex pictures).The total number of speakers in the database is 1649. Of these, there are 87 people with Alzheimer's disease, 175 people with Parkinson's disease, 62 people with mild cognitive impairment, 2 people with a mixed diagnosis of Alzheimer's + Parkinson's disease and 1323 healthy controls.For speakers who provided written consent (total number of 1003 speakers), we publish audio recordings in WAV format. We are also attaching a JSON file with ASR transcription, if available manual annotation (available for 965 speakers) and additional information about the speaker. For speakers who did not give their consent to publish the recording, only the JSON file is provided. ASR transcription is provided for all 1649 speakers. All 1649 speakers gave their consent to the provider to process their audio recordings. Therefore, it is possible for third party researchers to carry out their experiments also on the unpublished audio recordings through cooperation with the provider.

  8. f

    Dataset of British English speech recordings for psychoacoustics and speech...

    • salford.figshare.com
    • datasetcatalog.nlm.nih.gov
    application/x-gzip
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trevor John Cox; Simone Graetzer; Michael A Akeroyd; Jonathan Barker; John Culling; Graham Naylor; Eszter Porter; Rhoddy Viveros MuƱoz (2025). Dataset of British English speech recordings for psychoacoustics and speech processing research [Dataset]. http://doi.org/10.17866/rd.salford.16918180.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    University of Salford
    Authors
    Trevor John Cox; Simone Graetzer; Michael A Akeroyd; Jonathan Barker; John Culling; Graham Naylor; Eszter Porter; Rhoddy Viveros MuƱoz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Clarity Speech Corpus is a forty speaker British English speech dataset. The corpus was created for the purpose of running listening tests to gauge speech intelligibility and quality in the Clarity Project, which has the goal of advancing speech signal processing by hearing aids through a series of challenges. The dataset is suitable for machine learning and other uses in speech and hearing technology, acoustics and psychoacoustics. The data comprises recordings of approximately 10,000 sentences drawn from the British National Corpus (BNC) with suitable length, words and grammatical construction for speech intelligibility testing. The collection process involved the selection of a subset of BNC sentences, the recording of these produced by 40 British English speakers, and the processing of these recordings to create individual sentence recordings with associated prompts and metadata.clarity_utterances.v1_2.tar.gz contains all the recordings as .wav files, with the accompanying metadata such as text prompts in clarity_master.json. Further details are given in the readme.Sample_clarity_utterances.zip contains a sample of 10.Please reference the following data paper, which has details on how the corpus was generated: Graetzer, S., Akeroyd, M.A., Barker, J., Cox, T.J., Culling, J.F., Naylor, G., Porter, E. and MuƱoz, R.V., 2022. Dataset of British English speech recordings for psychoacoustics and speech processing research: the Clarity Speech Corpus. Data in Brief, p.107951.

  9. h

    sharif-emotional-speech-dataset

    • huggingface.co
    Updated Apr 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karami (2022). sharif-emotional-speech-dataset [Dataset]. https://huggingface.co/datasets/Mansooreh/sharif-emotional-speech-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2022
    Authors
    Karami
    Description

    ShEMO: a large-scale validated database for Persian speech emotion detection

      Abstract
    

    This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and… See the full description on the dataset page: https://huggingface.co/datasets/Mansooreh/sharif-emotional-speech-dataset.

  10. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

    The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

    Citing the RAVDESS

    The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

    Academic paper citation

    Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

    Personal use citation

    Include a link to this Zenodo page - https://zenodo.org/record/1188976

    Commercial Licenses

    Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

    Contact Information

    If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

    Example Videos

    Watch a sample of the RAVDESS speech and song videos.

    Emotion Classification Users

    If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

    Construction and Validation

    Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

    The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

    Contents

    Audio-only files

    Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

    • Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
    • Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

    Audio-Visual and Video-only files

    Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

    • Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
    • Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

    File Summary

    In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

    File naming convention

    Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

    Filename identifiers

    • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
    • Vocal channel (01 = speech, 02 = song).
    • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
    • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
    • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
    • Repetition (01 = 1st repetition, 02 = 2nd repetition).
    • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).


    Filename example: 02-01-06-01-02-01-12.mp4

    1. Video-only (02)
    2. Speech (01)
    3. Fearful (06)
    4. Normal intensity (01)
    5. Statement "dogs" (02)
    6. 1st Repetition (01)
    7. 12th Actor (12)
    8. Female, as the actor ID number is even.

    License information

    The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

    Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

    Related Data sets

  11. h

    speech-noise-dataset

    • huggingface.co
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    haydar (2025). speech-noise-dataset [Dataset]. https://huggingface.co/datasets/haydarkadioglu/speech-noise-dataset
    Explore at:
    Dataset updated
    Oct 31, 2025
    Authors
    haydar
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Speech and Noise Dataset

      Overview
    

    This dataset contains three types of audio recordings:

    Clean Speech → recordings of only speech without noise.
    Noisy Speech → recordings of speech mixed with noise.
    Noise Only → recordings of only background/environmental noise.

    The dataset is designed for speech enhancement, noise reduction, and speech recognition research.

      Dataset Structure
    
    
    
    
    
    
      clean_speech/ → speech-only recordings
    

    noisy_speech/ → speech… See the full description on the dataset page: https://huggingface.co/datasets/haydarkadioglu/speech-noise-dataset.

  12. H

    Data from: The ParlSpeech V2 data set: Full-text corpora of 6.3 million...

    • dataverse.harvard.edu
    • berd-platform.de
    Updated Mar 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Rauh; Jan Schwalbach (2020). The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies [Dataset]. http://doi.org/10.7910/DVN/L4OAKN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Christian Rauh; Jan Schwalbach
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom, covering periods between 21 and 32 years. Meta-data include information on date, speaker, party, and partially agenda item under which a speech was held. The accompanying release note provides a more detailed guide to the data.

  13. Donald Trump Rev.com Speech Transcripts Dataset

    • kaggle.com
    zip
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W.S. Tang (2024). Donald Trump Rev.com Speech Transcripts Dataset [Dataset]. https://www.kaggle.com/datasets/tangtaidje/donald-trump-rev-com-speech-transcripts-dataset
    Explore at:
    zip(12653416 bytes)Available download formats
    Dataset updated
    Sep 24, 2024
    Authors
    W.S. Tang
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    This dataset contains over 500 Donald Trump Speeches from the years of 2015-2024. This dataset was made public so that voters can have more context before casting their 2024 US Presidential Election votes. Historians, Linguists and Language Analysts can also use this data in years to come for their research purposes. The data is unbiased, strictly Donald Trump speech, and from a diverse range of times, topics and contexts. Please make good use of this carefully and meticulously crafted dataset and don't forget to share your findings! One last thing…..Don't forget to vote in 2024!

  14. 762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech...

    • nexdata.ai
    Updated Jan 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/970
    Explore at:
    Dataset updated
    Jan 2, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Latin America
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    Spanish(Latin America) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,630 people in total, such as Mexicans, Colombians, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  15. h

    multilingual-call-center-speech-dataset

    • huggingface.co
    Updated Oct 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AxonLabs (2025). multilingual-call-center-speech-dataset [Dataset]. https://huggingface.co/datasets/AxonData/multilingual-call-center-speech-dataset
    Explore at:
    Dataset updated
    Oct 9, 2025
    Authors
    AxonLabs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Multilingual Call Center Speech Recognition Dataset: 10,000 Hours

      Dataset Summary
    

    10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

      Dataset Features
    
    
    
    
    
    
      šŸ“Š Scale & Quality
    

    10,000 hours of inbound & outbound calls Real-world field… See the full description on the dataset page: https://huggingface.co/datasets/AxonData/multilingual-call-center-speech-dataset.

  16. F

    Czech General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Czech General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-czech
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Czech General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Czech speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Czech communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Czech speech models that understand and respond to authentic Czech accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Czech. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    •Participant Diversity:
    •
    Speakers: 60 verified native Czech speakers from FutureBeeAI’s contributor community.
    •
    Regions: Representing various provinces of Czech Republic to ensure dialectal diversity and demographic balance.
    •
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    •Recording Details:
    •
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    •
    Duration: Each conversation ranges from 15 to 60 minutes.
    •
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    •
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    •Sample Topics Include:
    •Family & Relationships
    •Food & Recipes
    •Education & Career
    •Healthcare Discussions
    •Social Issues
    •Technology & Gadgets
    •Travel & Local Culture
    •Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    •Transcription Highlights:
    •Speaker-segmented dialogues
    •Time-coded utterances
    •Non-speech elements (pauses, laughter, etc.)
    •High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    •
    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    •
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Czech speech and language AI applications:

    •
    ASR Development: Train accurate speech-to-text systems for Czech.
    •
    Voice Assistants: Build smart assistants capable of understanding natural Czech conversations.
    <span

  17. h

    german-speech-recognition-dataset

    • huggingface.co
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). german-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Speech Dataset for recognition task

    Dataset comprises 431 hours of telephone dialogues in German, collected from 590+ native speakers across various topics and domains, achieving an impressive 95% sentence accuracy rate. It is designed for research in automatic speech recognition (ASR) systems. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in transcribing audio, and natural language processing (NLP). - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset.

  18. E

    Japanese Kids Speech database (Upper Grade)

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Oct 8, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2020). Japanese Kids Speech database (Upper Grade) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0412/
    Explore at:
    Dataset updated
    Oct 8, 2020
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The Japanese Kids Speech database (Upper Grade) contains the total recordings of 232 Japanese Kids speakers (104 males and 128 females), from 9 to 13 years’ old (fourth, fifth and sixth graders in elementary school), recorded in quiet rooms using smartphones. This database may be combined with the Japanese Kids Speech database (Lower Grade) also available in the ELRA Catalogue under reference ELRA-S0411.Number of speakers, utterances and duration, age are as follows :Number of speakers 232 (104 male/128 female)Number of utterances (average):385 utterances per speakerTotal number of utterances:89,454Age: from 9 to 13 years' oldTotal hours of data: 145.41018 sentences were used. Recordings were made through smartphones and audio data stored in .wav files as sequences of 16KHz Mono, 16 bits, Linear PCM.Database:惻Audio data: WAV format, 16KHz, 16bit, mono (recorded with smartphone)惻Recording scripts: TSV format(tab-delimited), UTF-8 (without BOM)惻Transcription data: TSV format(tab-delimited), UTF-8 (without BOM)惻Size: 16.2GBNumber of speakers per age:9 years' old: 56 (21 male, 35 female)10 years' old: 71 (30 male, 41 female)11 years' old: 65 (28 male, 37 female)12 years' old: 38 (24 male, 14 female)13 years' old: 2 (1 male, 1 female)Structure of database:ā”œā”€ readme.txtā”œā”€ Japanese Kids Speech Database.pdfDescription document of the databaseā”œā”€ Transcription.tsvTranscriptionā”œā”€ scripts.tsvScript│└─ voices/directory of audio data ā”œā”€ high/directory of upper grade └─(speaker_ID/)directory of speaker ID (six digits) └─(audio_file)audio file (WAV format, 16KHz, 16bit, mono)File naming conventions of audio files are as follows:Field number | Contents | Description | Remarks0 | Language ID | ā€œJAā€ (fixed) | Japanese1 | Speaker ID | Six digit | 5XXXXX2 | Script ID | HXXXX | XXXX: four digits3 | Age | Two digits4 | Gender | M: male, F: femaleFiled separation character is ā€œ_ā€.For example, if the audio file name is ā€œJA_500002_H0001_10_F.wav, this file has the following meaning:JA: Language ID (Japanese)500002: speaker ID H0001: script ID 10: age (ten years old)F: gender (female)

  19. E

    MEDIA speech database for French

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Mar 27, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2008). MEDIA speech database for French [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/
    Explore at:
    Dataset updated
    Mar 27, 2008
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    French
    Description

    The MEDIA speech database for French was produced by ELDA within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT).It contains 1,258 transcribed dialogues from 250 adult speakers. The method chosen for the corpus construction process is that of a ā€˜Wizard of Oz’ (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of tourism and hotel reservation. The database is formatted following the SpeechDat conventions and it includes the following items:•1,258 recorded sessions for a total of 70 hours of speech. The signals are stored in a stereo wave file format. Each of the two speech channels is recorded at 8 kHz with 16 bit quantization with the least significant byte first (ā€œlohiā€ or Intel format) as signed integers. •Manual transcription of each session in XML format. Label files were created with the free transcription tool Transcriber (TRS files).•Phonetic lexicon containing all the words spoken in the database. Column 1 contains the orthography of the French word. Column 2 shows the frequency of the word. Column 3 contains the pronunciation in SAMPA format. Here is a sample entry of the lexicon:1)agitĆ©e3A/ Z i t e•Documentation and statistics are also provided with the database.The semantic annotation of the corpus is available in this catalogue and referenced ELRA-E0024 (MEDIA Evaluation Package).

  20. E

    ## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for...

    • dtechtive.com
    • find.data.gov.scot
    txt, zip
    Updated Mar 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR) (2016). ## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models [Dataset]. http://doi.org/10.7488/ds/1356
    Explore at:
    zip(821.6 MB), zip(0.3533 MB), zip(5.934 MB), txt(0.0166 MB), zip(912.7 MB), zip(162.6 MB), zip(147.1 MB)Available download formats
    Dataset updated
    Mar 22, 2016
    Dataset provided by
    University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERSEDED: THIS DATASET HAS BEEN REPLACED by the one which can be found at https://doi.org/10.7488/ds/2117. ## Clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the paper associated with the database. Some of the noises were obtained from the Demand database, available here: http://parole.loria.fr/DEMAND/ The speech database was obtained from the Voice Banking Corpus, available here: http://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). TSP speech database [Dataset]. https://service.tib.eu/ldmservice/dataset/tsp-speech-database

Data from: TSP speech database

Related Article
Explore at:
Dataset updated
Dec 16, 2024
Description

The TSP speech database is a dataset of speech recordings.

Search
Clear search
Close search
Google apps
Main menu