100+ datasets found
  1. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  2. E

    Data from: SNABI database for continuous speech recognition 1.2

    • live.european-language-grid.eu
    binary format
    Updated Mar 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2002). SNABI database for continuous speech recognition 1.2 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/20237
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Mar 1, 2002
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The SNABI speech database can be used to train continuous speech recognition for Slovene language. The database comprises 1530 sentences, 150 words and the alphabet. 132 speakers were recorded, each reading 200 sentences or more. This resulted in more than 15,000 recordings of speech signal contained in the database. The recordings were done in studio (SNABI SI_SSQ) and through a telephone line (SNABI SI_SFN).

  3. Call Center Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Axon Labs (2025). Call Center Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/axondata/call-center-speech-dataset
    Explore at:
    zip(12766164 bytes)Available download formats
    Dataset updated
    Oct 14, 2025
    Authors
    Axon Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Multilingual Call Center Speech Recognition Dataset: 10,000 Hours

    Dataset Summary

    10,000 hours of real-world call center speech recordings in 7 languages with transcripts. Train speech recognition, sentiment analysis, and conversation AI models on authentic customer support audio. Covers support, sales, billing, finance, and pharma domains

    Dataset Features

    📊 Scale & Quality

    • 10,000 hours of inbound & outbound calls
    • Real-world field recordings - no synthetic audio
    • With transcripts and concise summaries

    🎙️ Audio Specifications

    • Format: Single-channel (mono) telephone speech
    • Sample rate: 8,000 Hz
    • Non-synthetic source audio

    🌍 Languages (7)

    English, Russian, Polish, French, German, Spanish, Portuguese - Non-English calls include English translation - Additional languages available on request: Swedish, Dutch, Arabic, Japanese, etc.

    🏢 Domains

    Support, Billing/Account, Sales, Finance/Account Management, Pharma - Each call labeled by domain - Speaker roles annotated (Agent/Customer)

    Full version of dataset is availible for commercial usage - leave a request on our website Axonlabs to purchase the dataset 💰

    Purpose and Usage Scenarios

    • Automatic Speech Recognition, punctuation restoration, and speaker diarization on telephone speech
    • Intent detection, topic classification, and sentiment analysis from customer-service dialogs
    • Post-call concise summaries for QA/quality monitoring and CRM automation
    • Cross-lingual pipelines (original → English) and multilingual support models
  4. u

    Italian Speech Recognition Dataset

    • unidata.pro
    a-law/u-law, pcm
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Italian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/italian-speech-recognition-dataset/
    Explore at:
    a-law/u-law, pcmAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Unidata’s Italian Speech Recognition dataset refines AI models for better speech-to-text conversion and language comprehension

  5. h

    british-english-speech-recognition-dataset

    • huggingface.co
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). british-english-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/british-english-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    British English Speech Dataset for recognition task

    Dataset comprises 200 hours of high-quality audio recordings featuring 310 speakers, achieving an impressive 95% Sentence Accuracy Rate. This extensive collection of speech data is designed for NLP tasks such as speech recognition, dialogue systems, and language understanding. By utilizing this dataset, developers and researchers can advance their work in automatic speech recognition and improve recognition systems. - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/british-english-speech-recognition-dataset.

  6. u

    Slovenian Speech Recognition Dataset

    • unidata.pro
    mp3, wav
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Slovenian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/slovenian-speech-recognition/
    Explore at:
    mp3, wavAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Explore our Slovenian Speech Dataset with 10+ hours of clean phone dialogues in MP3/WAV, fully annotated for ASR and language models

  7. E

    Japanese Speech Recognition Corpus (Mobile)

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Apr 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2020). Japanese Speech Recognition Corpus (Mobile) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0228_70/
    Explore at:
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    This corpus comprises 16,792 entries uttered by 56 speakers (29 males and 27 females), recorded over the mobile telephone network. Speech samples are stored as a sequence of 16-bit 16 kHz for a total of 19.4 hours of speech.

  8. d

    Customer Support Audio Dataset [Frustration, Churn Signals, Emotional...

    • datarade.ai
    .wav
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WiserBrand.com (2023). Customer Support Audio Dataset [Frustration, Churn Signals, Emotional Speech] [Dataset]. https://datarade.ai/data-products/customer-support-audio-dataset-frustration-churn-signals-e-wiserbrand-com
    Explore at:
    .wavAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    WiserBrand
    Area covered
    United States of America
    Description

    Each audio file captures a real support conversation with emotional intensity, including refund requests and repeated unresolved issues.

    Dataset includes: - Customer-agent call recordings with labeled escalation triggers - Emotional audio cues (raised voice, interruptions, urgency tones) - Optional: timestamp metadata and industry classification
    - Dataset language: English (other languages on request)

    We provide custom datasets on demand: - Multi-language datasets - Calls from various countries - Calls to companies in specific industries (healthcare, banking, e-commerce, etc.)

    Use this dataset to: - Train voice AI models to detect escalation risks in real-time
    - Build speech-based churn prediction engines
    - Fine-tune LLMs and bots with real emotional tone inputs
    - Study audio-based frustration markers across verticals
    - Improve support routing or triage models based on live voice cues

    Perfect for voice AI teams, CX analysts, and researchers working on emotional speech, call scoring, and risk detection in real-time support environments.

    The more you purchase, the lower the price will be.

  9. Bengali Speech Recognition Dataset (BSRD)

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2025). Bengali Speech Recognition Dataset (BSRD) [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/bengali-speech-recognition-dataset-bsrd
    Explore at:
    zip(300882482 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Shuvo Kumar Basak-4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The BengaliSpeechRecognitionDataset (BSRD) is a comprehensive dataset designed for the development and evaluation of Bengali speech recognition and text-to-speech systems. This dataset includes a collection of Bengali characters and their corresponding audio files, which are generated using speech synthesis models. It serves as an essential resource for researchers and developers working on automatic speech recognition (ASR) and text-to-speech (TTS) applications for the Bengali language. Key Features: • Bengali Characters: The dataset contains a wide range of Bengali characters, including consonants, vowels, and unique symbols used in the Bengali script. This includes standard characters such as 'ক', 'খ', 'গ', and many more. • Corresponding Speech Data: For each Bengali character, an MP3 audio file is provided, which contains the correct pronunciation of that character. This audio is generated by a Bengali text-to-speech model, ensuring clear and accurate pronunciation. • 1000 Audio Samples per Folder: Each character is associated with at least 1000 MP3 files. These multiple samples provide variations of the character's pronunciation, which is essential for training robust speech recognition systems. • Language and Phonetic Diversity: The dataset offers a phonetic diversity of Bengali sounds, covering different tones and pronunciations commonly found in spoken Bengali. This ensures that the dataset can be used for training models capable of recognizing diverse speech patterns. • Use Cases: o Automatic Speech Recognition (ASR): BSRD is ideal for training ASR systems, as it provides accurate audio samples linked to specific Bengali characters. o Text-to-Speech (TTS): Researchers can use this dataset to fine-tune TTS systems for generating natural Bengali speech from text. o Phonetic Analysis: The dataset can be used for phonetic analysis and developing models that study the linguistic features of Bengali pronunciation. • Applications: o Voice Assistants: The dataset can be used to build and train voice recognition systems and personal assistants that understand Bengali. o Speech-to-Text Systems: BSRD can aid in developing accurate transcription systems for Bengali audio content. o Language Learning Tools: The dataset can help in creating educational tools aimed at teaching Bengali pronunciation.

    …………………………………..Note for Researchers Using the dataset………………………………………………………………………

    This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.

  10. d

    Lithuanian audio dataset for speech recognition 20 hours (4/5)

    • datarade.ai
    .json
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StageZero (2024). Lithuanian audio dataset for speech recognition 20 hours (4/5) [Dataset]. https://datarade.ai/data-products/lithuanian-audio-dataset-for-speech-recognition-20-hours-4-5-stagezero
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset authored and provided by
    StageZero
    Area covered
    Lithuania
    Description

    Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.

  11. French Speech Recognition Dataset

    • kaggle.com
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). French Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/french-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    French Speech Dataset for recognition task

    Dataset comprises 547 hours of telephone dialogues in French, collected from 964 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in natural language processing (NLP), speech recognition, and machine learning technologies. - Get the data

    The dataset includes high-quality audio recordings with accurate transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fb7af35fb0b3dabe083683bebd27fc5e5%2Fweweewew.PNG?generation=1739885543448162&alt=media" alt="">

    • Audio files: High-quality recordings in WAV format
    • Text transcriptions: Accurate and detailed transcripts for each audio segment
    • Speaker information: Metadata on native speakers, including gender and etc
    • Topics: Diverse domains such as general conversations, business and etc

    The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  12. u

    Arabic Speech Recognition Dataset

    • unidata.pro
    m4a, mp3, wav, aac
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Arabic Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/arabic-speech-recognition/
    Explore at:
    m4a, mp3, wav, aacAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Discover our Arabic Speech Dataset with 10+ hours of UAE dialogues in M4A/MP3/WAV/AAC. Clean, annotated audio for ASR training

  13. d

    Bulgarian audio dataset for speech recognition 20 hours (1/4)

    • datarade.ai
    .json
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StageZero (2024). Bulgarian audio dataset for speech recognition 20 hours (1/4) [Dataset]. https://datarade.ai/data-products/bulgarian-audio-dataset-for-speech-recognition-20-hours-1-4-stagezero
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset authored and provided by
    StageZero
    Area covered
    Bulgaria
    Description

    Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.

  14. h

    german-speech-recognition-dataset

    • huggingface.co
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). german-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Speech Dataset for recognition task

    Dataset comprises 431 hours of telephone dialogues in German, collected from 590+ native speakers across various topics and domains, achieving an impressive 95% sentence accuracy rate. It is designed for research in automatic speech recognition (ASR) systems. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in transcribing audio, and natural language processing (NLP). - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset.

  15. m

    Chichewa Customer Speech Dataset

    • data.macgence.com
    • kaggle.com
    mp3
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Chichewa Customer Speech Dataset [Dataset]. https://data.macgence.com/dataset/chichewa-customer-speech-dataset
    Explore at:
    mp3Available download formats
    Dataset updated
    Apr 2, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Discover the Chichewa Customer Speech Dataset, perfect for AI training, language processing, and speech analysis to develop advanced communication systems.

  16. h

    spanish-speech-recognition-dataset

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). spanish-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset
    Explore at:
    Dataset updated
    Jul 30, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Telephone Dialogues Dataset - 488 Hours

    Dataset comprises 488 hours of high-quality telephone audio recordings in Spanish, featuring 600 native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

      Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset.
    
  17. Arabic Speech Commands Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Ghandoura; Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. http://doi.org/10.5281/zenodo.4662481
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdulkader Ghandoura; Abdulkader Ghandoura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Speech Commands Dataset

    This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

    Dataset Description

    Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

    Dataset Structure

    There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6th time.

    Data Split

    We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

    Citations

    If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

    @article{arabicspeechcommandsv1,
       author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma},
       title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting},
       journal = {Engineering Applications of Artificial Intelligence},
       year = {2021},
       publisher={Elsevier}
    }

  18. T

    speech_commands

    • tensorflow.org
    • datasets.activeloop.ai
    • +1more
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
    Explore at:
    Dataset updated
    Jan 13, 2023
    Description

    An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('speech_commands', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  19. h

    Swedish Speech Recognition Corpus (Mobile)

    • en.haitianruisheng.com
    Updated Sep 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataOceanAI (2023). Swedish Speech Recognition Corpus (Mobile) [Dataset]. en.haitianruisheng.com
    Explore at:
    Dataset updated
    Sep 4, 2023
    Dataset provided by
    datatoceanai
    DataOceanAI
    Authors
    DataOceanAI
    Variables measured
    Product name, Recording duration, Recording language, Recording platform, Recording parameters, Recording environment, Product library number
    Description

    The identification data is recorded in both a quiet environment and a noisy environment, and collected from a total of 302 speakers, including 144 males and 158 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news.

  20. E

    Italian Speech Recognition Corpus (Desktop)

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2020). Italian Speech Recognition Corpus (Desktop) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0228_80/
    Explore at:
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    This corpus comprises 49,994 entries uttered by 50 speakers (23 males and 27 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 24.21hours of speech per channel.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
Organization logo

Spanish Speech Recognition Dataset

Dataset comprises 488 hours of telephone dialogues in Spanish

Explore at:
168 scholarly articles cite this dataset (View in Google Scholar)
zip(93217 bytes)Available download formats
Dataset updated
Jun 25, 2025
Authors
Unidata
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Spanish Speech Dataset for recognition task

Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Search
Clear search
Close search
Google apps
Main menu