100+ datasets found
  1. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  2. Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    China, Philippines, Singapore, Belgium, Malaysia, Austria, Hong Kong, Sweden, Colombia, Canada
    Description
    1. Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

    Recording environment : professional recording studio.

    Recording content : general narrative sentences, interrogative sentences, etc.

    Speaker : native speaker

    Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

    Device : Microphone

    Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

    Application scenarios : speech synthesis

    Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
  3. 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Multilingual Language Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-8khz-tele-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Kazakhstan, Colombia, Uzbekistan, Jordan, Ukraine, Puerto Rico, United Republic of, Georgia, Bulgaria, Sri Lanka
    Description
    1. Specifications Format : 8kHz, 8bit, u-law/a-law pcm, mono channel;

    Environment : quiet indoor environment, without echo;

    Recording content : No preset linguistic data,dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

    Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

    Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

    Device : Telephony recording system;

    Language : 100+ Languages;

    Application scenarios : speech recognition; voiceprint recognition;

    Accuracy rate : the word accuracy rate is not less than 98%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Multilingual Language Data and 800TB of Computer Vision Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  4. French Speech Recognition Dataset

    • kaggle.com
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). French Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/french-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    French Speech Dataset for recognition task

    Dataset comprises 547 hours of telephone dialogues in French, collected from 964 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in natural language processing (NLP), speech recognition, and machine learning technologies. - Get the data

    The dataset includes high-quality audio recordings with accurate transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fb7af35fb0b3dabe083683bebd27fc5e5%2Fweweewew.PNG?generation=1739885543448162&alt=media" alt="">

    • Audio files: High-quality recordings in WAV format
    • Text transcriptions: Accurate and detailed transcripts for each audio segment
    • Speaker information: Metadata on native speakers, including gender and etc
    • Topics: Diverse domains such as general conversations, business and etc

    The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  5. Bengali Speech Recognition Dataset (BSRD)

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2025). Bengali Speech Recognition Dataset (BSRD) [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/bengali-speech-recognition-dataset-bsrd
    Explore at:
    zip(300882482 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Shuvo Kumar Basak-4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The BengaliSpeechRecognitionDataset (BSRD) is a comprehensive dataset designed for the development and evaluation of Bengali speech recognition and text-to-speech systems. This dataset includes a collection of Bengali characters and their corresponding audio files, which are generated using speech synthesis models. It serves as an essential resource for researchers and developers working on automatic speech recognition (ASR) and text-to-speech (TTS) applications for the Bengali language. Key Features: • Bengali Characters: The dataset contains a wide range of Bengali characters, including consonants, vowels, and unique symbols used in the Bengali script. This includes standard characters such as 'ক', 'খ', 'গ', and many more. • Corresponding Speech Data: For each Bengali character, an MP3 audio file is provided, which contains the correct pronunciation of that character. This audio is generated by a Bengali text-to-speech model, ensuring clear and accurate pronunciation. • 1000 Audio Samples per Folder: Each character is associated with at least 1000 MP3 files. These multiple samples provide variations of the character's pronunciation, which is essential for training robust speech recognition systems. • Language and Phonetic Diversity: The dataset offers a phonetic diversity of Bengali sounds, covering different tones and pronunciations commonly found in spoken Bengali. This ensures that the dataset can be used for training models capable of recognizing diverse speech patterns. • Use Cases: o Automatic Speech Recognition (ASR): BSRD is ideal for training ASR systems, as it provides accurate audio samples linked to specific Bengali characters. o Text-to-Speech (TTS): Researchers can use this dataset to fine-tune TTS systems for generating natural Bengali speech from text. o Phonetic Analysis: The dataset can be used for phonetic analysis and developing models that study the linguistic features of Bengali pronunciation. • Applications: o Voice Assistants: The dataset can be used to build and train voice recognition systems and personal assistants that understand Bengali. o Speech-to-Text Systems: BSRD can aid in developing accurate transcription systems for Bengali audio content. o Language Learning Tools: The dataset can help in creating educational tools aimed at teaching Bengali pronunciation.

    …………………………………..Note for Researchers Using the dataset………………………………………………………………………

    This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.

  6. Hindi speech data

    • kaggle.com
    zip
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divya Taneja (2025). Hindi speech data [Dataset]. https://www.kaggle.com/datasets/divyataneja/hindi-speech-data
    Explore at:
    zip(1317708621 bytes)Available download formats
    Dataset updated
    Jun 20, 2025
    Authors
    Divya Taneja
    Description

    As part of my Ph.D. research under the supervision of Dr. Shobha Bhatt at Netaji Subhas University of Technology (NSUT), Delhi, I have developed a specialized Hindi Speech Recognition Dataset focused on Women’s Security Applications. The dataset comprises 2254 audio files, all recorded and curated personally to address the lack of context-specific Hindi ASR resources. The recordings include continuous sentences relevant to women’s safety scenarios. Each audio file was captured at a 16 kHz sampling rate in .wav format, manually transcribed in Devanagari script, and validated for phoneme diversity and acoustic clarity. This dataset serves as a critical resource for training and evaluating speech-enabled safety applications in low-resource Indian language settings and forms the foundation of my ongoing research in robust and real-time Hindi ASR systems for embedded devices.

  7. Arabic Speech Commands Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Ghandoura; Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. http://doi.org/10.5281/zenodo.4662481
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdulkader Ghandoura; Abdulkader Ghandoura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Speech Commands Dataset

    This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

    Dataset Description

    Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

    Dataset Structure

    There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6th time.

    Data Split

    We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

    Citations

    If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

    @article{arabicspeechcommandsv1,
       author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma},
       title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting},
       journal = {Engineering Applications of Artificial Intelligence},
       year = {2021},
       publisher={Elsevier}
    }

  8. m

    Video Dataset for training AI/ML Models

    • data.macgence.com
    mp3
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Video Dataset for training AI/ML Models [Dataset]. https://data.macgence.com/dataset/video-dataset-for-training-aiml-models
    Explore at:
    mp3Available download formats
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Enhance AI/ML training with Macgence's diverse video dataset. High-quality visuals optimized for accuracy, reliability, and advanced model development!

  9. t

    Data from: wav2vec: Unsupervised Pre-Training for Speech Recognition

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). wav2vec: Unsupervised Pre-Training for Speech Recognition [Dataset]. https://service.tib.eu/ldmservice/dataset/wav2vec--unsupervised-pre-training-for-speech-recognition
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Unsupervised Pre-Training for Speech Recognition

  10. u

    Arabic Speech Recognition Dataset

    • unidata.pro
    m4a, mp3, wav, aac
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Arabic Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/arabic-speech-recognition/
    Explore at:
    m4a, mp3, wav, aacAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Discover our Arabic Speech Dataset with 10+ hours of UAE dialogues in M4A/MP3/WAV/AAC. Clean, annotated audio for ASR training

  11. A

    Artificial Intelligence Training Dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Artificial Intelligence Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/artificial-intelligence-training-dataset-1958994
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 3, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Artificial Intelligence (AI) Training Dataset market is experiencing robust growth, driven by the increasing adoption of AI across diverse sectors. The market's expansion is fueled by the burgeoning need for high-quality data to train sophisticated AI algorithms capable of powering applications like smart campuses, autonomous vehicles, and personalized healthcare solutions. The demand for diverse dataset types, including image classification, voice recognition, natural language processing, and object detection datasets, is a key factor contributing to market growth. While the exact market size in 2025 is unavailable, considering a conservative estimate of a $10 billion market in 2025 based on the growth trend and reported market sizes of related industries, and a projected CAGR (Compound Annual Growth Rate) of 25%, the market is poised for significant expansion in the coming years. Key players in this space are leveraging technological advancements and strategic partnerships to enhance data quality and expand their service offerings. Furthermore, the increasing availability of cloud-based data annotation and processing tools is further streamlining operations and making AI training datasets more accessible to businesses of all sizes. Growth is expected to be particularly strong in regions with burgeoning technological advancements and substantial digital infrastructure, such as North America and Asia Pacific. However, challenges such as data privacy concerns, the high cost of data annotation, and the scarcity of skilled professionals capable of handling complex datasets remain obstacles to broader market penetration. The ongoing evolution of AI technologies and the expanding applications of AI across multiple sectors will continue to shape the demand for AI training datasets, pushing this market toward higher growth trajectories in the coming years. The diversity of applications—from smart homes and medical diagnoses to advanced robotics and autonomous driving—creates significant opportunities for companies specializing in this market. Maintaining data quality, security, and ethical considerations will be crucial for future market leadership.

  12. h

    german-speech-recognition-dataset

    • huggingface.co
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). german-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/german-speech-recognition-dataset
    Explore at:
    Dataset updated
    Aug 2, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Telephone Dialogues Dataset - 431 Hours

    Dataset comprises 431 hours of high-quality audio recordings from 590+ native German speakers, featuring telephone dialogues across diverse topics and domains. With a 95% sentence accuracy rate, this essential dataset is ideal for training and evaluating German speech recognition systems. - Get the data

      Dataset characteristics:
    

    Characteristic Data

    Description Audio of telephone dialogues in German for training… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/german-speech-recognition-dataset.

  13. s

    ShefCE: A Cantonese-English bilingual speech corpus -- speech recognition...

    • orda.shef.ac.uk
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wai Man Ng; Alvin C.M. Kwan; Tan Lee; Thomas Hain (2023). ShefCE: A Cantonese-English bilingual speech corpus -- speech recognition model sets and recording transcripts [Dataset]. http://doi.org/10.15131/shef.data.4522925.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    The University of Sheffield
    Authors
    Wai Man Ng; Alvin C.M. Kwan; Tan Lee; Thomas Hain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This online repository contains the speech recognition model sets and the recording transcripts used in the phoneme/syllable recognition experiments reported in [1].Speech recognition model sets-----------------------------------------The speech recognition model sets are available as a tarball,named model.tar.gz, in this repository.The models were trained on Cantonese and English data. For each language, two model sets were trained according to the background setting and the mixed-condition setting respectively. All models are DNN-HMM models, which are hybrid feed-forward neural network models with 6 hidden layers and 2048 neurons per layer. Details can be found in [1]. The Cantonese models include a bigram syllable language model. The English models include a bigram phoneme language model. All model sets are provided in the kaldi format.1. The background-cantonese model was trained on CUSENT (68 speakers, 19.4 hours) of read Cantonese speech.2. The background-english model was trained on WSJ-SI84 (83 speakers, 15.2 hours) of read English speech3. The mixed-condition-cantonese model was trained on background-cantonese data and ShefCE Cantonese training data (25 speakers, 9.7 hours).4. The mixed-condition-english model was trained on background-english data and ShefCE English training data (25 speakers, 2.3 hours)Recording transcripts----------------------------The recording transcripts are available as a tarball, named, stms.tar.gz, in this repository. These transcripts cover the ShefCE portion of the training data and the ShefCE test data.Four files can be found in the stms.tar.gz archive. - ShefCE_RC.train.v*.stm contains the transcripts for ShefCE training set (Cantonese)- ShefCE_RE.train.v*.stm contains the transcripts for ShefCE training set (English)- ShefCE_RC.test.v*.stm contains the transcripts for ShefCE test set (Cantonese)- ShefCE_RE.test.v*.stm contains the transcripts for ShefCE test set (English)The ShefCE corpus data can be accessed online with DOI:10.15131/shef.data.4522907Please cite [1] for the use of ShefCE data, models or transcripts.[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

  14. Scripted Monologues Speech Data | 65,000 Hours | GenAI Audio Data|...

    • datarade.ai
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Scripted Monologues Speech Data | 65,000 Hours | GenAI Audio Data| Text-to-Speech Data| Multilingual Language Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-read-speech-data-65-000-hours-aud-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Russian Federation, Japan, Lebanon, Cambodia, El Salvador, Brazil, Netherlands, Slovenia, Jordan, France
    Description
    1. Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

    Recording environment : quiet indoor environment, without echo

    Recording content (read speech) : economy, entertainment, news, oral language, numbers, letters

    Speaker : native speaker, gender balance

    Device : Android mobile phone, iPhone

    Language : 100+ languages

    Transcription content : text, time point of speech data, 5 noise symbols, 5 special identifiers

    Accuracy rate : 95% (the accuracy rate of noise symbols and other identifiers is not included)

    Application scenarios : speech recognition, voiceprint recognition

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Multilingual Language Data and 800TB of Computer Vision Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
  15. u

    Italian Speech Recognition Dataset

    • unidata.pro
    a-law/u-law, pcm
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata L.L.C-FZ, Italian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/italian-speech-recognition-dataset/
    Explore at:
    a-law/u-law, pcmAvailable download formats
    Dataset authored and provided by
    Unidata L.L.C-FZ
    Description

    Unidata’s Italian Speech Recognition dataset refines AI models for better speech-to-text conversion and language comprehension

  16. Speech Words to Text

    • kaggle.com
    zip
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shriamrut V (2020). Speech Words to Text [Dataset]. https://www.kaggle.com/datasets/shriamrut/speech-words-to-text
    Explore at:
    zip(1038579384 bytes)Available download formats
    Dataset updated
    Nov 24, 2020
    Authors
    Shriamrut V
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context

    A speech words to text model, where the model recognizes simple words and converts them to text.

    Content

    The model is trained on TensorFlow's speech recognition dataset. The model recognizes words like left, right, up, down, one, two, three, four, five, six, seven, eight, nine, yes and no. The model achieved an accuracy of 0.9933 in the training dataset and 0.93 accuracy in the test or validation dataset. To find out how the model was trained, check out this repo https://github.com/shriamrut/Speech-Words-to-Text.

    Inspiration

    How audio is understood by a computer? That question is where the inspiration came from.

  17. h

    spanish-speech-recognition-dataset

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). spanish-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset
    Explore at:
    Dataset updated
    Jul 30, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Telephone Dialogues Dataset - 488 Hours

    Dataset comprises 488 hours of high-quality telephone audio recordings in Spanish, featuring 600 native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

      Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset.
    
  18. h

    french-speech-recognition-dataset

    • huggingface.co
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). french-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/french-speech-recognition-dataset
    Explore at:
    Dataset updated
    Sep 29, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    French Telephone Dialogues Dataset - 547 Hours

    his speech recognition dataset comprises 547 hours of telephone dialogues in French from 964 native speakers, providing audio recordings with detailed annotations (text, speaker ID, gender, age) to support speech recognition systems, natural language processing, and deep learning models for training and evaluating automatic speech recognition technology. - Get the data

      Dataset characteristics:
    

    Characteristic Data… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/french-speech-recognition-dataset.

  19. h

    american-speech-recognition-dataset

    • huggingface.co
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). american-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/american-speech-recognition-dataset
    Explore at:
    Dataset updated
    Sep 29, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    American Telephone Dialogues Dataset - 1,136 Hours

    The dataset includes 1,136 hours of annotated telephone dialogues from 1,416 native speakers across the United States. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

      Dataset characteristics:
    

    Characteristic Data… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/american-speech-recognition-dataset.

  20. a

    Speech Commands

    • datasets.activeloop.ai
    • tensorflow.org
    • +1more
    deeplake
    Updated Mar 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P Warden (2022). Speech Commands [Dataset]. http://identifiers.org/arxiv:1804.03209
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 24, 2022
    Authors
    P Warden
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Speech Commands Dataset is a dataset of 30,000 short (1-3 seconds) audio recordings of 30 different spoken words. It is a popular dataset for keyword spotting and speech recognition research. The dataset is split into a training set of 24,000 recordings and a test set of 6,000 recordings.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
Organization logo

Spanish Speech Recognition Dataset

Dataset comprises 488 hours of telephone dialogues in Spanish

Explore at:
168 scholarly articles cite this dataset (View in Google Scholar)
zip(93217 bytes)Available download formats
Dataset updated
Jun 25, 2025
Authors
Unidata
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Spanish Speech Dataset for recognition task

Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

Search
Clear search
Close search
Google apps
Main menu