100+ datasets found
  1. Speech_Command|Application of Speech Recognition

    • kaggle.com
    zip
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VK (2022). Speech_Command|Application of Speech Recognition [Dataset]. https://www.kaggle.com/datasets/venkatkumar001/speechcommands
    Explore at:
    zip(820205557 bytes)Available download formats
    Dataset updated
    Mar 28, 2022
    Authors
    VK
    Description

    Google Researcher published the Speech command Dataset! I'm publishing only 14 subcategories of voice data in a 1sec period.

    I'm doing preprocessing and generating JSON files and uploading the dataset! feel free to use it

    Enjoy! And develop your key spotting Application Efficiently

  2. Call Center Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Axon Labs (2025). Call Center Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/axondata/call-center-speech-dataset
    Explore at:
    zip(12766164 bytes)Available download formats
    Dataset updated
    Oct 14, 2025
    Authors
    Axon Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Multilingual Call Center Speech Recognition Dataset: 10,000 Hours

    Dataset Summary

    10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

    Dataset Features

    📊 Scale & Quality

    • 10,000 hours of inbound & outbound calls
    • Real-world field recordings - no synthetic audio
    • With transcripts and concise summaries

    🎙️ Audio Specifications

    • Format: Single-channel (mono) telephone speech
    • Sample rate: 8,000 Hz
    • Non-synthetic source audio

    🌍 Languages (7)

    English, Russian, Polish, French, German, Spanish, Portuguese - Non-English calls include English translation - Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.

    🏢 Domains

    Support, Billing/Account, Sales, Finance/Account Management, Pharma - Each call labeled by domain - Speaker roles annotated (Agent/Customer)

    Full version of dataset is availible for commercial usage - leave a request on our website Axonlabs to purchase the dataset 💰

    Purpose and Usage Scenarios

    • Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
    • Intent detection, topic classification, and sentiment analysis from customer-service dialogs
    • Post-call concise summaries for QA/quality monitoring and CRM automation
    • Cross-lingual pipelines (original → English) and multilingual support models
  3. d

    Speech Recognition Dataset [Customer Calls] – Transcribed support...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WiserBrand.com, Speech Recognition Dataset [Customer Calls] – Transcribed support conversations for training voice AI systems [Dataset]. https://datarade.ai/data-products/speech-recognition-dataset-customer-calls-transcribed-sup-wiserbrand-com
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset provided by
    WiserBrand
    Area covered
    Moldova (Republic of), Portugal, Greece, Poland, Slovenia, United Kingdom, Denmark, Czech Republic, Croatia, Norway
    Description

    This dataset is designed for building and improving speech recognition systems. It features transcribed customer service calls from real interactions across 160+ industries, including retail, banking, telecom, logistics, healthcare, and entertainment. Calls are natural, unscripted, and emotion-rich — making the data especially valuable for training models that must interpret speech under real-world conditions.

    Each dataset entry includes:

    • Full call transcription (agent + customer dialogue)
    • Human-written call summary
    • Overall sentiment label: positive, neutral, or negative
    • Metadata: call duration, caller location (city, state, country), timestamp
    • Optional: company name and industry tag

    Use this dataset to:

    • Train speech-to-text models on real customer language patterns. -Benchmark or evaluate speech recognition tools in support settings
    • Improve voice interfaces, chatbots, and IVR systems.
    • Model tone, frustration cues, and escalation behaviors
    • Support LLM fine-tuning for tasks involving spoken input.s

    This dataset provides your speech recognition models with exposure to genuine customer conversations, helping you build tools that can listen, understand, and act in line with how people actually speak.

    The larger the volume you purchase, the lower the price will be.

  4. Bengali Speech Recognition Dataset (BSRD)

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2025). Bengali Speech Recognition Dataset (BSRD) [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/bengali-speech-recognition-dataset-bsrd
    Explore at:
    zip(300882482 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Shuvo Kumar Basak-4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The BengaliSpeechRecognitionDataset (BSRD) is a comprehensive dataset designed for the development and evaluation of Bengali speech recognition and text-to-speech systems. This dataset includes a collection of Bengali characters and their corresponding audio files, which are generated using speech synthesis models. It serves as an essential resource for researchers and developers working on automatic speech recognition (ASR) and text-to-speech (TTS) applications for the Bengali language. Key Features: • Bengali Characters: The dataset contains a wide range of Bengali characters, including consonants, vowels, and unique symbols used in the Bengali script. This includes standard characters such as 'ক', 'খ', 'গ', and many more. • Corresponding Speech Data: For each Bengali character, an MP3 audio file is provided, which contains the correct pronunciation of that character. This audio is generated by a Bengali text-to-speech model, ensuring clear and accurate pronunciation. • 1000 Audio Samples per Folder: Each character is associated with at least 1000 MP3 files. These multiple samples provide variations of the character's pronunciation, which is essential for training robust speech recognition systems. • Language and Phonetic Diversity: The dataset offers a phonetic diversity of Bengali sounds, covering different tones and pronunciations commonly found in spoken Bengali. This ensures that the dataset can be used for training models capable of recognizing diverse speech patterns. • Use Cases: o Automatic Speech Recognition (ASR): BSRD is ideal for training ASR systems, as it provides accurate audio samples linked to specific Bengali characters. o Text-to-Speech (TTS): Researchers can use this dataset to fine-tune TTS systems for generating natural Bengali speech from text. o Phonetic Analysis: The dataset can be used for phonetic analysis and developing models that study the linguistic features of Bengali pronunciation. • Applications: o Voice Assistants: The dataset can be used to build and train voice recognition systems and personal assistants that understand Bengali. o Speech-to-Text Systems: BSRD can aid in developing accurate transcription systems for Bengali audio content. o Language Learning Tools: The dataset can help in creating educational tools aimed at teaching Bengali pronunciation.

    …………………………………..Note for Researchers Using the dataset………………………………………………………………………

    This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.

  5. a

    LibriSpeech

    • datasets.activeloop.ai
    • tensorflow.org
    • +2more
    deeplake
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). LibriSpeech [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/librispeech-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Dec 12, 2022
    Dataset authored and provided by
    Google
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Carnegie Mellon University
    Description

    The LibriSpeech dataset is a corpus of read English speech. It was created by Google and Carnegie Mellon University, and is used for training and evaluating speech recognition models. The dataset consists of 960 hours of speech, divided into a training set of 920 hours and a test set of 40 hours. The speech is read by a variety of speakers, and covers a wide range of topics. The dataset is available for free download.

  6. F

    British English Scripted Monologue Speech Data for Healthcare

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English Scripted Monologue Speech Data for Healthcare [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Introducing the UK English Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of English language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.

    Speech Data

    This dataset includes over 6,000 high-quality scripted audio prompts recorded in UK English, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.

    Participant Diversity
    Speakers: 60 native UK English speakers.
    Regional Balance: Participants are sourced from multiple regions across United Kingdom, reflecting diverse dialects and linguistic traits.
    Demographics: Includes a mix of male and female participants (60:40 ratio), aged between 18 and 70 years.
    Recording Specifications
    Nature of Recordings: Scripted monologues based on healthcare-related use cases.
    Duration: Each clip ranges between 5 to 30 seconds, offering short, context-rich speech samples.
    Audio Format: WAV files recorded in mono, with 16-bit depth and sample rates of 8 kHz and 16 kHz.
    Environment: Clean and echo-free spaces ensure clear and noise-free audio capture.

    Topic Coverage

    The prompts span a broad range of healthcare-specific interactions, such as:

    Patient check-in and follow-up communication
    Appointment booking and cancellation dialogues
    Insurance and regulatory support queries
    Medication, test results, and consultation discussions
    General health tips and wellness advice
    Emergency and urgent care communication
    Technical support for patient portals and apps
    Domain-specific scripted statements and FAQs

    Contextual Depth

    To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:

    Names: Gender- and region-appropriate United Kingdom names
    Addresses: Varied local address formats spoken naturally
    Dates & Times: References to appointment dates, times, follow-ups, and schedules
    Medical Terminology: Common medical procedures, symptoms, and treatment references
    Numbers & Measurements: Health data like dosages, vitals, and test result values
    Healthcare Institutions: Names of clinics, hospitals, and diagnostic centers

    These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.

    Transcription

    Every audio recording is accompanied by a verbatim, manually verified transcription.

    Content: The transcription mirrors the exact scripted prompt recorded by the speaker.
    Format: Files are delivered in plain text (.TXT) format with consistent naming conventions for seamless integration.
    <b

  7. E

    AURORA-5

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). AURORA-5 [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-AURORA-CD0005/
    Explore at:
    Dataset updated
    Aug 16, 2017
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Aurora project was originally set up to establish a worldwide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system.The earlier three Aurora experiments had a focus on additive noise and the influence of some telephone frequency characteristics. Aurora-5 tries to cover all effects as they occur in realistic application scenarios. The focus was put on two scenarios. The first one is the hands-free speech input in the noisy car environment with the intention of controlling either devices in the car itself or retrieving information from a remote speech server over the telephone. The second one covers the hands-free speech input in a type of office or in a type of living room to control e.g. a telephone device or some audio/video equipment.The AURORA-5 database contains the following data:•Artificially distorted versions of the recordings from adult speakers in the TI-Digits speech database downsampled at a sampling frequency of 8000 Hz. The distortions consist of: - additive background noise, - the simulation of a hands-free speech input in rooms, - the simulation of transmitting speech over cellular telephone networks.•A subset of recordings from the meeting recorder project at the International Computer Science Institute. The recordings contain sequences of digits uttered by different speakers in hands-free mode in a meeting room.•A set of scripts for running recognition experiments on the above mentioned speech data. The experiments are based on the usage of the freely available software package HTK where HTK is not part of this resource.Further information is also available at the following address: http://aurora.hsnr.de

  8. T

    speech_commands

    • tensorflow.org
    • datasets.activeloop.ai
    • +1more
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
    Explore at:
    Dataset updated
    Jan 13, 2023
    Description

    An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('speech_commands', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  9. h

    m-ailabs_speech_dataset_fr

    • huggingface.co
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Théo Gigant (2022). m-ailabs_speech_dataset_fr [Dataset]. https://huggingface.co/datasets/gigant/m-ailabs_speech_dataset_fr
    Explore at:
    Dataset updated
    Jun 1, 2022
    Authors
    Théo Gigant
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description


    The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis.

    Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format.

    A transcription is provided for each clip. Clips vary in length from 1 to 20 seconds and have a total length of approximately shown in the list (and in the respective info.txt-files) below.

    The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded by the LibriVox project and is also in the public domain – except for Ukrainian.

    Ukrainian audio was kindly provided either by Nash Format or Gwara Media for machine learning purposes only (please check the data info.txt files for details).

  10. Podcast Database - Complete Podcast Metadata, All Countries & Languages

    • datarade.ai
    .json, .csv, .sql
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Listen Notes (2025). Podcast Database - Complete Podcast Metadata, All Countries & Languages [Dataset]. https://datarade.ai/data-products/podcast-database-complete-podcast-metadata-all-countries-listen-notes
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Listen Notes
    Area covered
    Zambia, Colombia, Gibraltar, Turkey, Indonesia, Slovenia, Iran (Islamic Republic of), Anguilla, Bosnia and Herzegovina, Guinea-Bissau
    Description

    == Quick facts ==

    The most up-to-date and comprehensive podcast database available All languages & All countries Includes over 3,600,000 podcasts Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in SQLite format Learn how we build a high quality podcast database: https://www.listennotes.help/article/105-high-quality-podcast-database-from-listen-notes

    == Use Cases ==

    AI training, including speech recognition, generative AI, voice cloning / synthesis, and news analysis Alternative data for investment research, such as sentiment analysis of executive interviews, market research and tracking investment themes PR and marketing, including social monitoring, content research, outreach, and guest booking ...

    == Data Attributes ==

    See the full list of data attributes on this page: https://www.listennotes.com/podcast-datasets/fields/?filter=podcast_only

    How to access podcast audio files: Our dataset includes RSS feed URLs for all podcasts. You can retrieve audio for over 170 million episodes directly from these feeds. With access to the raw audio, you’ll have high-quality podcast speech data ideal for AI training and related applications.

    == Custom Offers ==

    We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.

    We also provide a RESTful API at PodcastAPI.com

    Contact us: hello@listennotes.com

    == Need Help? ==

    If you have any questions about our products, feel free to reach out hello@listennotes.com

    == About Listen Notes, Inc. ==

    Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.

  11. m

    Bengali Speech Recognition - Bangla Real Number Audio Dataset

    • data.mendeley.com
    Updated Feb 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Mahadi Hasan Nahid (2018). Bengali Speech Recognition - Bangla Real Number Audio Dataset [Dataset]. http://doi.org/10.17632/t33byr6cpt.3
    Explore at:
    Dataset updated
    Feb 4, 2018
    Authors
    Md Mahadi Hasan Nahid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ========================================================

    This dataset is developed by Md Ashraful Islam, SUST CSE'2010 Md Mahadi Hasan Nahid, SUST CSE'2010 (nahid-cse@sust.edu)

    Department of Computer Science and Engineering (CSE) 
    Shahjalal University of Science and Technology (SUST), www.sust.edu 
    

    Special Thanks To Mohammad Al-Amin, SUST CSE'2011 Md Mazharul Islam Midhat, SUST CSE'2010 Md Mahedi Hasan Nayem, SUST CSE'2010
    Avro Key Board, Omicron lab, https://www.omicronlab.com/index.html

    =========================================================

    It is a Audio Text Parallel Corpus. This dataset contains Some Recording Audio of Bangla Real Number and Its Coresponding Text. Specially designed for Bangla Speech recognition.

    There are five speakers(alamin, ashraful, midhat, nahid, nayem) in this dataset.

    Vocabulary Contains only bangla real numbers (shunno-ekshoto, hazar, loksho, koti, doshomic etc.)

    Total Number of Audio file : 175 (35 from each speaker) Age range of the speakers : 20-23

    Total Size: 32.4MB

    TextData.txt file contains the text of the audio set. Each line starts with tag and ends with tag. The file name is added after each line using parenthesis, in this audio file you will get its recorder Audio Data. This text data actually generated using Avro (Free Opensourse Writting Software).

    ==========================================================

    For Full Data: please contact nahid-cse@sust.edu

  12. Z

    Speech and Noise Corpora for Pitch Estimation of Human Speech

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastian Bechtold (2020). Speech and Noise Corpora for Pitch Estimation of Human Speech [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3920590
    Explore at:
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Jade Hochschule
    Authors
    Bastian Bechtold
    Description

    This dataset contains common speech and noise corpora for evaluating fundamental frequency estimation algorithms as convenient JBOF dataframes. Each corpus is available freely on its own, and allows redistribution:

    CMU-ARCTIC (BSD license) [1]

    FDA (free to download) [2]

    KEELE (free for noncommercial use) [3]

    MOCHA-TIMIT (free for noncommercial use) [4]

    PTDB-TUG (ODBL license) [5]

    NOISEX (free to download) [7]

    QUT-NOISE (CC-BY-SA license) [8]

    These files are published as part of my dissertation, "Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods", and in support of the Replication Dataset for Fundamental Frequency Estimation.

    References:

    John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.

    Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.

    F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.

    Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.

    Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.

    John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.

    Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.

    David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.

  13. d

    Accented English Speech Dataset | Humam-to-Chatbot conversation | 1000+...

    • datarade.ai
    .mp3, .wav
    Updated Aug 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2025). Accented English Speech Dataset | Humam-to-Chatbot conversation | 1000+ hours of recordings [Dataset]. https://datarade.ai/data-products/accented-english-speech-dataset-1-5k-recordings-scripted-filemarket
    Explore at:
    .mp3, .wavAvailable download formats
    Dataset updated
    Aug 5, 2025
    Dataset authored and provided by
    FileMarket
    Area covered
    Russian Federation, Philippines, Rwanda, Bangladesh, Netherlands, Falkland Islands (Malvinas), Poland, Palestine, Curaçao, Grenada
    Description

    The Accented English Speech Dataset provides over 1,000 hours of authentic conversational recordings designed to strengthen ASR systems, conversational AI, and voice applications. Unlike synthetic or scripted datasets, this collection captures real human-to-human and chatbot-guided dialogues, reflecting natural speech flow, spontaneous phrasing, and diverse accents.

    Off-the-shelf recordings are available from:

    Mexico Colombia Guatemala Costa Rica El Salvador Dominican Republic South Africa

    This ensures exposure to Latin American, Caribbean, and African English accents, which are often missing from mainstream corpora.

    Beyond these, we support custom collection in any language and any accent worldwide, tailored to specific project requirements.

    Audio Specifications

    Format: WAV Sample rate: 48kHz Sample size: 16-bit PCM Channel: Mono/Stereo Double-track recording: Available upon request (clean separation of speakers) Data Structure and Metadata Dual-track or single-channel audio depending on project need Metadata includes speaker ID, demographic attributes, accent/region, and context Dialogues include both structured (chatbot/task-based) and free-flow natural conversations

    Use Cases

    • ASR Training & Benchmarking – Improve transcription across accented English
    • Accent Adaptation – Build robust, inclusive systems that work in real-world scenarios
    • Multilingual Voice Interfaces – Expand IVR and assistants to support more voices
    • Conversational AI – Train chatbots on authentic, unstructured dialogue
    • Voice Biometrics – Support research in identity verification and speaker profiling
    • Model Fine-Tuning – Enrich foundation models with high-quality speech data

    Why It Matters

    Mainstream datasets disproportionately focus on U.S. and U.K. English. This dataset fills the gap with diverse accented English coverage, and the ability to collect any language or accent on demand, enabling the creation of fairer, more accurate, and globally deployable AI solutions.

    Key Highlights

    • 1,000+ hours of accented English speech
    • Ready-to-use coverage from Latin America, Caribbean, and Africa
    • Authentic dialogues: human-to-human and chatbot-guided
    • WAV, 48kHz, 16-bit PCM, mono/stereo, double-track option
    • Metadata-rich recordings for advanced AI research
    • Custom collection in any language and accent
  14. m

    Speech Dataset of Human and AI-Generated Voices

    • data.mendeley.com
    • kaggle.com
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huzain Azis (2025). Speech Dataset of Human and AI-Generated Voices [Dataset]. http://doi.org/10.17632/5czyx2vppv.2
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Huzain Azis
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset consists of audio recordings in Indonesian language, categorized into two distinct classes: human voices (real) and synthetic voices generated using artificial intelligence (AI). Each class comprises 21 audio files, resulting in a total of 42 audio files. Each recording has a duration ranging from approximately 4 to 9 minutes, with an average length of around 6 minutes per file. All recordings are provided in WAV format and accompanied by a CSV file containing detailed duration metadata for each audio file.

    This dataset is suitable for research and applications in speech recognition, voice authenticity detection, audio analysis, and related fields. It enables comparative analysis between natural Indonesian speech and AI-generated synthetic speech.

  15. ASR MTC Dataset

    • kaggle.com
    zip
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Motawie (2024). ASR MTC Dataset [Dataset]. https://www.kaggle.com/datasets/mohamedmotawie/amongus
    Explore at:
    zip(251790735 bytes)Available download formats
    Dataset updated
    Jun 29, 2024
    Authors
    Mohamed Motawie
    Description

    This contain Egyptian Arabic Dataset for MTC Competition, be careful version 4 for training and validation and 5 are used for testing. The code for the ASR can be found on our notebooks, for any advice or material feel free to contact us. Best of luck guys!!

  16. F

    Tamil General Domain Scripted Monologue Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Tamil General Domain Scripted Monologue Speech Data [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-tamil-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Tamil Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of Tamil language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic Tamil speech data.

    Speech Data

    This dataset features over 6,000 high-quality scripted monologue recordings in Tamil. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.

    Participant Diversity
    Speakers: 60 native Tamil speakers
    Regions: Broad regional coverage ensures diverse accents and dialects
    Demographics: Participants aged 18 to 70, with a 60:40 male-to-female ratio
    Recording Specifications
    Recording Type: Scripted monologues and prompt-based recordings
    Audio Duration: 5 to 30 seconds per file
    Format: WAV, mono channel, 16-bit, 8 kHz & 16 kHz sample rates
    Environment: Clean, noise-free conditions to ensure clarity and usability

    Topic Coverage

    The dataset covers a wide variety of general conversation scenarios, including:

    Daily Conversations
    Topic-Specific Discussions
    General Knowledge and Advice
    Idioms and Sayings

    Contextual Features

    To enhance authenticity, the prompts include:

    Names: Male and female names specific to different Tamil Nadu regions
    Addresses: Commonly used address formats in daily Tamil speech
    Dates & Times: References used in general scheduling and time expressions
    Organization Names: Names of businesses, institutions, and other entities
    Numbers & Currencies: Mentions of quantities, prices, and monetary values

    Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.

    Transcription

    Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.

    Content: Exact match to the spoken audio
    Format: Plain text (.TXT), named identically to the corresponding audio file
    Quality Control: All transcripts are validated by native Tamil transcribers

    Metadata

    Rich metadata is included for detailed filtering and analysis:

    Speaker Metadata: Unique speaker ID, age, gender, region, and dialect
    Audio Metadata: Prompt transcript, recording setup, device specs, sample rate, bit depth, and format

    Applications & Use Cases

    This dataset can power a variety of Tamil language AI technologies, including:

    Speech Recognition Training: ASR model development and fine-tuning
    <div

  17. Z

    Arabic Speech Commands Dataset

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4662480
    Explore at:
    Dataset updated
    Apr 5, 2021
    Dataset provided by
    Syrian Virtual University
    Authors
    Abdulkader Ghandoura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Speech Commands Dataset

    This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

    Dataset Description

    Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

    Dataset Structure

    There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6th time.

    Data Split

    We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

    Citations

    If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

    @article{arabicspeechcommandsv1, author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma}, title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting}, journal = {Engineering Applications of Artificial Intelligence}, year = {2021}, publisher={Elsevier} }

  18. i

    Data from: Dysarthric speech database for universal access research

    • incluset.com
    Updated 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heejin Kim; Mark Allan Hasegawa-Johnson; Adrienne Perlman; Jon Gunderson; Thomas S Huang; Kenneth Watkin; Simone Frame (2007). Dysarthric speech database for universal access research [Dataset]. https://incluset.com/datasets
    Explore at:
    Dataset updated
    2007
    Authors
    Heejin Kim; Mark Allan Hasegawa-Johnson; Adrienne Perlman; Jon Gunderson; Thomas S Huang; Kenneth Watkin; Simone Frame
    Measurement technique
    Participants read a variety of words, like digits, letters, computer commands, common words, and uncommon words from porject Gutenberg novels.
    Description

    This dataset is collected to enhance research into speech recognition systems for dysarthic speech.

  19. i

    Data from: The TORGO database of acoustic and articulatory speech from...

    • incluset.com
    Updated 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Rudzicz; Aravind Kumar Namasivayam; Talya Wolff (2011). The TORGO database of acoustic and articulatory speech from speakers with dysarthria [Dataset]. https://incluset.com/datasets
    Explore at:
    Dataset updated
    2011
    Authors
    Frank Rudzicz; Aravind Kumar Namasivayam; Talya Wolff
    Measurement technique
    It consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS).
    Description

    This database is orignally created for a resource for developing advanced models in automatic speech recognition that are more suited to the needs of people with dysarthria.

  20. h

    Gujarati_OpenSLR

    • huggingface.co
    Updated Aug 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Korat (2023). Gujarati_OpenSLR [Dataset]. https://huggingface.co/datasets/Pratik/Gujarati_OpenSLR
    Explore at:
    Dataset updated
    Aug 22, 2023
    Authors
    Korat
    Description

    OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition. They intend to be a convenient place for anyone to put resources that they have created, so that they can be downloaded publicly. They aim to provide a central, hassle-free place for others to put their speech resources. see there http://www.openslr.org/contributions.html

    Supported Task

    Automatic Speech Recognition

    Languages… See the full description on the dataset page: https://huggingface.co/datasets/Pratik/Gujarati_OpenSLR.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
VK (2022). Speech_Command|Application of Speech Recognition [Dataset]. https://www.kaggle.com/datasets/venkatkumar001/speechcommands
Organization logo

Speech_Command|Application of Speech Recognition

14 Different Word Voice_Command (Duration-1sec)

Explore at:
zip(820205557 bytes)Available download formats
Dataset updated
Mar 28, 2022
Authors
VK
Description

Google Researcher published the Speech command Dataset! I'm publishing only 14 subcategories of voice data in a 1sec period.

I'm doing preprocessing and generating JSON files and uploading the dataset! feel free to use it

Enjoy! And develop your key spotting Application Efficiently

Search
Clear search
Close search
Google apps
Main menu