100+ datasets found

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...
datarade.ai
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data [Dataset]. https://datarade.ai/data-products/nexdata-in-car-speech-data-15-000-hours-audio-ai-ml-t-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Apr 23, 2024
Dataset authored and provided by
Nexdata
Area covered
Switzerland, Russian Federation, Netherlands, Germany, Austria, Poland, Turkey, Egypt, Argentina, Romania
Description
Specifications Format : Audio format: 48kHz, 16bit, uncompressed wav, mono channel; Vedio format: MP4

Recording Environment : In-car;1 quiet scene, 1 low noise scene, 3 medium noise scenes and 2 high noise scenes

Recording Content : It covers 5 fields: navigation field, multimedia field, telephone field, car control field and question and answer field; 500 sentences per people

Speaker : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Device : High fidelity microphone; Binocular camera

Language : 20 languages

Transcription content : text

Accuracy rate : 98%

Application scenarios : speech recognition, Human-computer interaction; Natural language processing and text analysis; Visual content understanding, etc.

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data|...
datarade.ai
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data| Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-read-speech-data-65-000-hours-aud-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 11, 2023
Dataset authored and provided by
Nexdata
Area covered
Puerto Rico, Italy, Uruguay, Poland, France, Japan, Luxembourg, Taiwan, Pakistan, Chile
Description
Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel

Recording environment : quiet indoor environment, without echo

Recording content (read speech) : economy, entertainment, news, oral language, numbers, letters

Speaker : native speaker, gender balance

Device : Android mobile phone, iPhone

Language : 100+ languages

Transcription content : text, time point of speech data, 5 noise symbols, 5 special identifiers

Accuracy rate : 95% (the accuracy rate of noise symbols and other identifiers is not included)

Application scenarios : speech recognition, voiceprint recognition

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
Bengali Speech Recognition Dataset (BSRD)
kaggle.com
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2025). Bengali Speech Recognition Dataset (BSRD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/10465455
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10465455
Dataset updated
Jan 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shuvo Kumar Basak-4004
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The BengaliSpeechRecognitionDataset (BSRD) is a comprehensive dataset designed for the development and evaluation of Bengali speech recognition and text-to-speech systems. This dataset includes a collection of Bengali characters and their corresponding audio files, which are generated using speech synthesis models. It serves as an essential resource for researchers and developers working on automatic speech recognition (ASR) and text-to-speech (TTS) applications for the Bengali language. Key Features: • Bengali Characters: The dataset contains a wide range of Bengali characters, including consonants, vowels, and unique symbols used in the Bengali script. This includes standard characters such as 'ক', 'খ', 'গ', and many more. • Corresponding Speech Data: For each Bengali character, an MP3 audio file is provided, which contains the correct pronunciation of that character. This audio is generated by a Bengali text-to-speech model, ensuring clear and accurate pronunciation. • 1000 Audio Samples per Folder: Each character is associated with at least 1000 MP3 files. These multiple samples provide variations of the character's pronunciation, which is essential for training robust speech recognition systems. • Language and Phonetic Diversity: The dataset offers a phonetic diversity of Bengali sounds, covering different tones and pronunciations commonly found in spoken Bengali. This ensures that the dataset can be used for training models capable of recognizing diverse speech patterns. • Use Cases: o Automatic Speech Recognition (ASR): BSRD is ideal for training ASR systems, as it provides accurate audio samples linked to specific Bengali characters. o Text-to-Speech (TTS): Researchers can use this dataset to fine-tune TTS systems for generating natural Bengali speech from text. o Phonetic Analysis: The dataset can be used for phonetic analysis and developing models that study the linguistic features of Bengali pronunciation. • Applications: o Voice Assistants: The dataset can be used to build and train voice recognition systems and personal assistants that understand Bengali. o Speech-to-Text Systems: BSRD can aid in developing accurate transcription systems for Bengali audio content. o Language Learning Tools: The dataset can help in creating educational tools aimed at teaching Bengali pronunciation.

…………………………………..Note for Researchers Using the dataset………………………………………………………………………

This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.
Arabic Speech Commands Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulkader Ghandoura; Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. http://doi.org/10.5281/zenodo.4662481
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4662481
Dataset updated
Apr 5, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abdulkader Ghandoura; Abdulkader Ghandoura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Arabic Speech Commands Dataset

This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

Dataset Description

Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

Dataset Structure

There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6^th time.

Data Split

We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

License

This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

Citations

If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

@article{arabicspeechcommandsv1, author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma}, title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting}, journal = {Engineering Applications of Artificial Intelligence}, year = {2021}, publisher={Elsevier} }
n
In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...
data.nexdata.ai
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data [Dataset]. https://data.nexdata.ai/products/nexdata-in-car-speech-data-15-000-hours-audio-ai-ml-t-nexdata
Explore at:
Dataset updated
Aug 3, 2024
Dataset authored and provided by
Nexdata
Area covered
Sweden, Bangladesh, Tanzania, Thailand, Slovakia, Ireland, South Africa, Colombia, Australia, Peru
Description
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and noise data.
8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...
datarade.ai
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-conversational-speech-data-8khz-tele-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 10, 2023
Dataset authored and provided by
Nexdata
Area covered
United Arab Emirates, Argentina, Romania, Singapore, Netherlands, Vietnam, Czech Republic, Poland, Philippines, United States of America
Description
Specifications Format : 8kHz, 8bit, u-law/a-law pcm, mono channel;

Environment : quiet indoor environment, without echo;

Recording content : No preset linguistic data，dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Annotation : annotating for the transcription text, speaker identification, gender and noise symbols;

Device : Telephony recording system;

Language : 100+ Languages;

Application scenarios : speech recognition; voiceprint recognition;

Accuracy rate : the word accuracy rate is not less than 98%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Machine Learning (ML) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade
u
Japanese Speech Recognition Dataset
unidata.pro
u-law/a-law, wav
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata L.L.C-FZ (2025). Japanese Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/japanese-speech-recognition-dataset/
Explore at:
wav, u-law/a-lawAvailable download formats
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Unidata L.L.C-FZ
Description
Train AI to understand Japanese with Unidata’s dataset, featuring diverse speech samples for better transcription accuracy
t
Data from: wav2vec: Unsupervised Pre-Training for Speech Recognition
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). wav2vec: Unsupervised Pre-Training for Speech Recognition [Dataset]. https://service.tib.eu/ldmservice/dataset/wav2vec--unsupervised-pre-training-for-speech-recognition
Explore at:
Dataset updated
Dec 16, 2024
Description
Unsupervised Pre-Training for Speech Recognition
P
M-AILabs speech dataset Dataset
paperswithcode.com
Updated Aug 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). M-AILabs speech dataset Dataset [Dataset]. https://paperswithcode.com/dataset/m-ailabs-speech-dataset
Explore at:
Dataset updated
Aug 2, 2022
Description
The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length from 1 to 20 seconds and have a total length of approximately shown in the list (and in the respective info.txt-files) below. The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded by the LibriVox project and is also in the public domain
m
Video Dataset for training AI/ML Models
data.macgence.com
mp3
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). Video Dataset for training AI/ML Models [Dataset]. https://data.macgence.com/dataset/video-dataset-for-training-aiml-models
Explore at:
mp3Available download formats
Dataset updated
Jul 18, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
Enhance AI/ML training with Macgence's diverse video dataset. High-quality visuals optimized for accuracy, reliability, and advanced model development!
s
Speech & Audio Datasets
sapien.io
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sapien (2025). Speech & Audio Datasets [Dataset]. https://www.sapien.io/dataset-marketplace/speech-audio-datasets-for-ai-training
Explore at:
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Sapien
License
https://www.sapien.io/termshttps://www.sapien.io/terms
Description
High-quality speech audio datasets designed for AI model training, supporting various applications like speech recognition, voice identification, and multilingual speech data.
F
English (UK) General Conversation Speech Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English (UK) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United Kingdom
Dataset funded by
FutureBeeAI
Description
What’s Included
Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on British accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in United Kingdom.
Speech Data:
This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of United Kingdom. This collaborative effort guarantees a balanced representation of British accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:
In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.
Transcription:
This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:
We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:
This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
S
Speech Recognition Data Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Speech Recognition Data Report [Dataset]. https://www.archivemarketresearch.com/reports/speech-recognition-data-563446
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global speech recognition market is experiencing robust growth, driven by the increasing adoption of voice assistants, the proliferation of smart devices, and advancements in artificial intelligence (AI). The market is projected to reach a substantial size, exhibiting a significant Compound Annual Growth Rate (CAGR). While precise figures for market size and CAGR aren't provided, considering the industry's rapid expansion and the involvement of major tech players like Google, Amazon, and Microsoft, a reasonable estimate would place the 2025 market size at approximately $15 billion, growing at a CAGR of 18% from 2025 to 2033. This substantial growth is fueled by several key factors. The rising demand for hands-free and voice-enabled interfaces in various applications, including automotive, healthcare, and customer service, is a major driver. Furthermore, continuous advancements in deep learning and natural language processing (NLP) technologies are leading to more accurate and efficient speech recognition systems. The increasing availability of large datasets for training these systems also contributes to improved performance and wider adoption. However, challenges remain. Data privacy concerns related to the collection and use of voice data pose a significant restraint. The need for robust security measures and transparent data handling practices is paramount to maintaining consumer trust and promoting wider market acceptance. Furthermore, achieving high accuracy in diverse acoustic environments and with varied accents continues to be an area of ongoing development. Despite these challenges, the long-term outlook for the speech recognition market remains highly positive, with continued innovation and expanding applications promising considerable growth throughout the forecast period. The market segmentation is expected to evolve, with specialized solutions for particular industries becoming increasingly prevalent.
u
Italian Speech Recognition Dataset
unidata.pro
a-law/u-law, pcm
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata L.L.C-FZ (2025). Italian Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/italian-speech-recognition-dataset/
Explore at:
a-law/u-law, pcmAvailable download formats
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Unidata L.L.C-FZ
Description
Unidata’s Italian Speech Recognition dataset refines AI models for better speech-to-text conversion and language comprehension
F
Travel Scripted Monologue Speech Data: Japanese (Japan)
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Travel Scripted Monologue Speech Data: Japanese (Japan) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/travel-scripted-speech-monologues-japanese-japan
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Japan
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Japanese Scripted Monologue Speech Dataset for the Travel Domain. This meticulously curated dataset is designed to advance the development of Japanese language speech recognition models, particularly for the Travel industry.
Speech Data
This training dataset comprises over 6,000 high-quality scripted prompt recordings in Japanese. These recordings cover various topics and scenarios relevant to the Travel domain, designed to build robust and accurate customer service speech technology.
•Participant Diversity:
•
Speakers: 60 native Japanese speakers from different regions of Japan.

•
Regions: Ensures a balanced representation of Japanese accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:
•
Recording Nature: Audio recordings of scripted prompts/monologues.

•
Audio Duration: Average duration of 5 to 30 seconds per recording.

•
Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.

•
Environment: Recordings are conducted in quiet settings without background noise and echo.

•
Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Travel sector. Topics include:

•Customer Service Interactions
•Booking and Reservations
•Travel Inquiries
•Technical Support
•General Information and Advice
•Promotional and Sales Events
•Domain Specific Statements
•
Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Travel interactions:

•
Names: Region-specific names of males and females in various formats.

•
Addresses: Region-specific addresses in different spoken formats.

•
Dates & Times: Inclusion of date and time in various travel contexts, such as booking dates, departure and arrival times.

•
Destinations: Specific names of cities, countries, and tourist attractions relevant to the travel sector.

•
Numbers & Prices: Various numbers and prices related to ticket costs, hotel rates, and transaction amounts.

•
Booking IDs and Confirmation Numbers: Inclusion of booking identification and confirmation details for realistic customer service scenarios.

Each scripted prompt is crafted to reflect real-life scenarios encountered in the Travel domain, ensuring applicability in training robust natural language processing and speech recognition models.
Transcription Data
In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.
•
Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.

•
Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.

<div style="margin-top:10px;
Speech Recognition Data Collection Services | 100+ Languages Resources...
data.nexdata.ai
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-speech-recognition-data-collection-services-100-nexdata
Explore at:
Dataset updated
Aug 3, 2024
Dataset authored and provided by
Nexdata
Area covered
Luxembourg, Tunisia, Netherlands, Lebanon, Mongolia, Finland, New Zealand, Singapore, Jordan, Cambodia
Description
Nexdata is equipped with professional recording equipment and has resources pool of 70+ countries and regions, and provide various types of speech recognition data collection services for Machine Learning (ML) Data.
d
Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...
datarade.ai
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 10, 2023
Dataset authored and provided by
Nexdata
Area covered
Canada, Austria, Philippines, Malaysia, Sweden, Colombia, Finland, Belgium, Singapore, Hong Kong
Description
Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

Recording environment : professional recording studio.

Recording content : general narrative sentences, interrogative sentences, etc.

Speaker : native speaker

Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

Device : Microphone

Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

Application scenarios : speech synthesis

Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
F
Healthcare Scripted Monologue Speech Data: English (Australia)
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Healthcare Scripted Monologue Speech Data: English (Australia) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-english-australia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Australia
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Australian English Scripted Monologue Speech Dataset for the Healthcare Domain. This meticulously curated dataset is designed to advance the development of English language speech recognition models, particularly for the Healthcare industry.
Speech Data
This training dataset comprises over 6,000 high-quality scripted prompt recordings in Australian English. These recordings cover various topics and scenarios relevant to the Healthcare domain, designed to build robust and accurate customer service speech technology.
•Participant Diversity:
•
Speakers: 60 native English speakers from different regions of Australia.

•
Regions: Ensures a balanced representation of Australian English accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:
•
Recording Nature: Audio recordings of scripted prompts/monologues.

•
Audio Duration: Average duration of 5 to 30 seconds per recording.

•
Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.

•
Environment: Recordings are conducted in quiet settings without background noise and echo.

•
Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Healthcare sector. Topics include:

•Patient Interactions
•Medical Consultations
•Healthcare Services Inquiries
•Technical Support
•General Information and Advice
•Regulatory and Compliance Queries
•Emergency and Urgent Care
•Domain Specific Statements
•
Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Healthcare interactions:

•
Names: Region-specific names of males and females in various formats.

•
Addresses: Region-specific addresses in different spoken formats.

•
Dates & Times: Inclusion of date and time in various healthcare contexts, such as appointment dates and medication schedules.

•
Medical Terms: Specific medical terminology relevant to diagnoses, treatments, and procedures.

•
Numbers & Measurements: Various numbers and measurements related to dosages, test results, and medical statistics.

•
Healthcare Facilities: Names of hospitals, clinics, and medical institutions relevant to the healthcare sector.

Each scripted prompt is crafted to reflect real-life scenarios encountered in the Healthcare domain, ensuring applicability in training robust natural language processing and speech recognition models.
Transcription Data
In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.
•
Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.

•
Format: <span style="font-weight:
F
Punjabi General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Punjabi General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-punjabi-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Punjabi General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Punjabi speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Punjabi communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Punjabi speech models that understand and respond to authentic Indian accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Punjabi. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Punjabi speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Punjab to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Punjabi speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Punjabi.

•
Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.

<span
Z
Automatic speech recognition datasets for Gronings, Nasal, and Besemah
data.niaid.nih.gov
zenodo.org
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McDonnell (2023). Automatic speech recognition datasets for Gronings, Nasal, and Besemah [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7946869
Explore at:
Dataset updated
May 18, 2023
Dataset provided by
Bartelds
McDonnell
Wieling
Jurafsky
San
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Automatic speech recognition datasets for Gronings, Nasal, and Besemah for experiments reported in Bartelds, San, McDonnell, Jurafsky and Wieling (2023). Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation. ACL 2023.

Model training code available at: https://github.com/Bartelds/asr-augmentation

Facebook

Twitter

Click to copy link

Link copied

Cite

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data

Explore at:

.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats

Dataset updated

Apr 23, 2024

Dataset authored and provided by

Nexdata

Area covered

Switzerland, Russian Federation, Netherlands, Germany, Austria, Poland, Turkey, Egypt, Argentina, Romania

Description

Specifications Format : Audio format: 48kHz, 16bit, uncompressed wav, mono channel; Vedio format: MP4

Recording Environment : In-car;1 quiet scene, 1 low noise scene, 3 medium noise scenes and 2 high noise scenes

Recording Content : It covers 5 fields: navigation field, multimedia field, telephone field, car control field and question and answer field; 500 sentences per people

Speaker : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc.

Device : High fidelity microphone; Binocular camera

Language : 20 languages

Transcription content : text

Accuracy rate : 98%

Application scenarios : speech recognition, Human-computer interaction; Natural language processing and text analysis; Visual content understanding, etc.

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade

Clear search

Close search

Google apps

Main menu

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...

Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data|...

Bengali Speech Recognition Dataset (BSRD)

Arabic Speech Commands Dataset

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition...

8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech...

Japanese Speech Recognition Dataset

Data from: wav2vec: Unsupervised Pre-Training for Speech Recognition

M-AILabs speech dataset Dataset

Video Dataset for training AI/ML Models

Speech & Audio Datasets

English (UK) General Conversation Speech Dataset

What’s Included

Speech Recognition Data Report

Italian Speech Recognition Dataset

Travel Scripted Monologue Speech Data: Japanese (Japan)

Introduction

Speech Data

Transcription Data

Speech Recognition Data Collection Services | 100+ Languages Resources...

Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

Healthcare Scripted Monologue Speech Data: English (Australia)

Introduction

Speech Data

Transcription Data

Punjabi General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Automatic speech recognition datasets for Gronings, Nasal, and Besemah

In-Cabin Speech Data | 15,000 Hours | AI Training Data | Speech Recognition Data | Audio Data |Natural Language Processing (NLP) Data