100+ datasets found

d
Norwegian audio dataset for speech recognition 20 hours (5/5)
datarade.ai
.json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StageZero (2024). Norwegian audio dataset for speech recognition 20 hours (5/5) [Dataset]. https://datarade.ai/data-products/norwegian-audio-dataset-for-speech-recognition-20-hours-5-5-stagezero
Explore at:
.jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
StageZero
Area covered
Norway
Description
Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.
s
Speech & Audio Datasets
sapien.io
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sapien (2025). Speech & Audio Datasets [Dataset]. https://www.sapien.io/dataset-marketplace/speech-audio-datasets-for-ai-training
Explore at:
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Sapien
License
https://www.sapien.io/termshttps://www.sapien.io/terms
Description
High-quality speech audio datasets designed for AI model training, supporting various applications like speech recognition, voice identification, and multilingual speech data.
TORGO Dataset for Dysarthric Speech - Audio Files
kaggle.com
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranay Koppula (2023). TORGO Dataset for Dysarthric Speech - Audio Files [Dataset]. https://www.kaggle.com/datasets/pranaykoppula/torgo-audio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pranay Koppula
Description
Citation: DOI 10.1007/s10579-011-9145-0

Collection of Audio Recordings by the Department of Computer Science at the University of Toronto from speakers with and without Dysarthtria. Useful for tasks like Audio Classification, Disease Detection, Speech Processing, etc.

Directory Structure:

F_Con : Audio Samples of female speakers from the control group, i.e., female speakers without dysarthria. 'FC01' in the folder names and the filenames refers to the first speaker, 'FC02' refers to the second speaker and so on. 'S01' refers to the first recording session with that speaker, 'S02' refers to the second session and so on. 'arrayMic' suggests that the audio was recorded with an array microphone, whereas 'headMic' suggests that the audio was recorded by a headpiece microphone.

F_Dys : Audio Samples of female speakers with dysarthria. 'F01' in the folder names and the filenames refers to the first speaker, 'F03' refers to the second speaker and so on. 'S01' refers to the first recording session with that speaker, 'S02' refers to the second session and so on. 'arrayMic' suggests that the audio was recorded with an array microphone, whereas 'headMic' suggests that the audio was recorded by a headpiece microphone.

M_Con : Audio Samples of male speakers from the control group, i.e., male speakers without dysarthria. 'MC01' in the folder names and the filenames refers to the first speaker, 'MC02' refers to the second speaker and so on. 'S01' refers to the first recording session with that speaker, 'S02' refers to the second session and so on. 'arrayMic' suggests that the audio was recorded with an array microphone, whereas 'headMic' suggests that the audio was recorded by a headpiece microphone.

M_Dys : Audio Samples of male speakers with dysarthria. 'M01' in the folder names and the filenames refers to the first speaker, 'M03' refers to the second speaker and so on. 'S01' refers to the first recording session with that speaker, 'S02' refers to the second session and so on. 'arrayMic' suggests that the audio was recorded with an array microphone, whereas 'headMic' suggests that the audio was recorded by a headpiece microphone.
d
Lithuanian audio dataset for speech recognition 20 hours (4/5)
datarade.ai
.json
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StageZero (2024). Lithuanian audio dataset for speech recognition 20 hours (4/5) [Dataset]. https://datarade.ai/data-products/lithuanian-audio-dataset-for-speech-recognition-20-hours-4-5-stagezero
Explore at:
.jsonAvailable download formats
Dataset updated
Jul 11, 2024
Dataset authored and provided by
StageZero
Area covered
Lithuania
Description
Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.
F
English (UK) General Conversation Speech Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English (UK) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United Kingdom
Dataset funded by
FutureBeeAI
Description
What’s Included
Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on British accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in United Kingdom.
Speech Data:
This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of United Kingdom. This collaborative effort guarantees a balanced representation of British accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:
In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.
Transcription:
This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:
We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:
This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
F
Japanese General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Japanese General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-japanese-japan
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Japanese General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Japanese speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Japanese communication.
Curated by FutureBeeAI, this 40 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Japanese speech models that understand and respond to authentic Japanese accents and dialects.
Speech Data
The dataset comprises 40 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Japanese. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 80 verified native Japanese speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Japan to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Japanese speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Japanese.

•
Voice Assistants: Build smart assistants capable of understanding natural Japanese conversations.
Bengali Speech Recognition Dataset (BSRD)
kaggle.com
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2025). Bengali Speech Recognition Dataset (BSRD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/10465455
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10465455
Dataset updated
Jan 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shuvo Kumar Basak-4004
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The BengaliSpeechRecognitionDataset (BSRD) is a comprehensive dataset designed for the development and evaluation of Bengali speech recognition and text-to-speech systems. This dataset includes a collection of Bengali characters and their corresponding audio files, which are generated using speech synthesis models. It serves as an essential resource for researchers and developers working on automatic speech recognition (ASR) and text-to-speech (TTS) applications for the Bengali language. Key Features: • Bengali Characters: The dataset contains a wide range of Bengali characters, including consonants, vowels, and unique symbols used in the Bengali script. This includes standard characters such as 'ক', 'খ', 'গ', and many more. • Corresponding Speech Data: For each Bengali character, an MP3 audio file is provided, which contains the correct pronunciation of that character. This audio is generated by a Bengali text-to-speech model, ensuring clear and accurate pronunciation. • 1000 Audio Samples per Folder: Each character is associated with at least 1000 MP3 files. These multiple samples provide variations of the character's pronunciation, which is essential for training robust speech recognition systems. • Language and Phonetic Diversity: The dataset offers a phonetic diversity of Bengali sounds, covering different tones and pronunciations commonly found in spoken Bengali. This ensures that the dataset can be used for training models capable of recognizing diverse speech patterns. • Use Cases: o Automatic Speech Recognition (ASR): BSRD is ideal for training ASR systems, as it provides accurate audio samples linked to specific Bengali characters. o Text-to-Speech (TTS): Researchers can use this dataset to fine-tune TTS systems for generating natural Bengali speech from text. o Phonetic Analysis: The dataset can be used for phonetic analysis and developing models that study the linguistic features of Bengali pronunciation. • Applications: o Voice Assistants: The dataset can be used to build and train voice recognition systems and personal assistants that understand Bengali. o Speech-to-Text Systems: BSRD can aid in developing accurate transcription systems for Bengali audio content. o Language Learning Tools: The dataset can help in creating educational tools aimed at teaching Bengali pronunciation.

…………………………………..Note for Researchers Using the dataset………………………………………………………………………

This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.
m
General conversation speech datasets in Shona for School
data.macgence.com
mp3
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). General conversation speech datasets in Shona for School [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-shona-for-school
Explore at:
mp3Available download formats
Dataset updated
May 25, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes General Conversation, featuring Shona speakers from Africa with detailed metadata.
h
medical_asr_recording_dataset
huggingface.co
Updated Oct 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hani. M (2023). medical_asr_recording_dataset [Dataset]. https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 19, 2023
Authors
Hani. M
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Data Source Kaggle Medical Speech, Transcription, and Intent Context

8.5 hours of audio utterances paired with text for common medical symptoms.

Content

This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field. This Figure Eight… See the full description on the dataset page: https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset.
u
Japanese Speech Recognition Dataset
unidata.pro
u-law/a-law, wav
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata L.L.C-FZ (2025). Japanese Speech Recognition Dataset [Dataset]. https://unidata.pro/datasets/japanese-speech-recognition-dataset/
Explore at:
wav, u-law/a-lawAvailable download formats
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Unidata L.L.C-FZ
Description
Train AI to understand Japanese with Unidata’s dataset, featuring diverse speech samples for better transcription accuracy
h
alphanumeric-audio-dataset
huggingface.co
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sakshee Patil (2024). alphanumeric-audio-dataset [Dataset]. https://huggingface.co/datasets/sakshee05/alphanumeric-audio-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2024
Authors
Sakshee Patil
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Speech Recognition Bias Reduction Project

Executive Summary

Welcome to the Speech Recognition Bias Reduction Project. It aims to create a more inclusive and representative dataset for improving automated speech recognition systems. This project addresses the challenges faced by speakers with non-native English accents, particularly when interacting with automated voice systems that struggle to interpret alphanumeric information such as names, phone numbers, and addresses.… See the full description on the dataset page: https://huggingface.co/datasets/sakshee05/alphanumeric-audio-dataset.
m
Bangla Real Number Audio Dataset
data.mendeley.com
Updated Feb 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Mahadi Hasan Nahid (2018). Bangla Real Number Audio Dataset [Dataset]. http://doi.org/10.17632/t33byr6cpt.1
Explore at:
Unique identifier
https://doi.org/10.17632/t33byr6cpt.1
Dataset updated
Feb 4, 2018
Authors
Md Mahadi Hasan Nahid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
========================================================

This dataset is developed by Md Ashraful Islam, SUST CSE'2010 Md Mahadi Hasan Nahid, SUST CSE'2010 (nahid-cse@sust.edu)

Department of Computer Science and Engineering (CSE) Shahjalal University of Science and Technology (SUST), www.sust.edu

Special Thanks To Mohammad Al-Amin, SUST CSE'2011 Md Mazharul Islam Midhat, SUST CSE'2010 Md Mahedi Hasan Nayem, SUST CSE'2010
Avro Key Board, Omicron lab, https://www.omicronlab.com/index.html

=========================================================

It is a Audio Text Parallel Corpus. This dataset contains Some Recording Audio of Bangla Real Number and Its Coresponding Text. Specially designed for Bangla Speech recognition.

There are five speakers(alamin, ashraful, midhat, nahid, nayem) in this dataset.

Vocabulary Contains only bangla real numbers (shunno-ekshoto, hazar, loksho, koti, doshomic etc.)

Total Number of Audio file : 175 (35 from each speaker) Age range of the speakers : 20-23

Total Size: 32.4MB

TextData.txt file contains the text of the audio set. Each line starts with tag and ends with tag. The file name is added after each line using parenthesis, in this audio file you will get its recorder Audio Data. This text data actually generated using Avro (Free Opensourse Writting Software).

==========================================================

For Full Data: please contact nahid-cse@sust.edu
Arabic Speech Commands Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulkader Ghandoura; Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. http://doi.org/10.5281/zenodo.4662481
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4662481
Dataset updated
Apr 5, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abdulkader Ghandoura; Abdulkader Ghandoura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Arabic Speech Commands Dataset

This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

Dataset Description

Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

Dataset Structure

There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6^th time.

Data Split

We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

License

This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

Citations

If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

@article{arabicspeechcommandsv1, author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma}, title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting}, journal = {Engineering Applications of Artificial Intelligence}, year = {2021}, publisher={Elsevier} }
P
Zambezi Voice Dataset
paperswithcode.com
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claytone Sikasote; Kalinda Siaminwe; Stanly Mwape; Bangiwe Zulu; Mofya Phiri; Martin Phiri; David Zulu; Mayumbo Nyirenda; Antonios Anastasopoulos (2024). Zambezi Voice Dataset [Dataset]. https://paperswithcode.com/dataset/zambezi-voice
Explore at:
Dataset updated
Sep 29, 2024
Authors
Claytone Sikasote; Kalinda Siaminwe; Stanly Mwape; Bangiwe Zulu; Mofya Phiri; Martin Phiri; David Zulu; Mayumbo Nyirenda; Antonios Anastasopoulos
Description
This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0 large-scale multilingual pre-trained model to build end-to-end (E2E) speech recognition models for our baseline models. The dataset is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be accessed through the project repository.
common_voice_12_0
huggingface.co
Updated Mar 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mozilla Foundation (2023). common_voice_12_0 [Dataset]. https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0
Explore at:
Dataset updated
Mar 24, 2023
Dataset authored and provided by
Mozilla Foundationhttp://mozilla.org/
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Common Voice Corpus 12.0

Dataset Summary

The Common Voice dataset consists of a unique MP3 and corresponding text file. Many of the 26119 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help improve the accuracy of speech recognition engines. The dataset currently consists of 17127 validated hours in 104 languages, but more voices and languages are always added. Take a look at the Languages page to… See the full description on the dataset page: https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0.
m
African speaker Speech Dataset in Igbo
data.macgence.com
mp3
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). African speaker Speech Dataset in Igbo [Dataset]. https://data.macgence.com/dataset/african-speaker-speech-dataset-in-igbo
Explore at:
mp3Available download formats
Dataset updated
Apr 28, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes general conversations, featuring Igbo speakers from Africa with detailed metadata.
m
UAE speaker Speech Dataset in Arabic
data.macgence.com
mp3
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). UAE speaker Speech Dataset in Arabic [Dataset]. https://data.macgence.com/dataset/uae-speaker-speech-dataset-in-arabic
Explore at:
mp3Available download formats
Dataset updated
May 17, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
United Arab Emirates, Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes general conversations, featuring Arabic speakers from UAE with detailed metadata.
m
USA speaker Speech Dataset in English
data.macgence.com
mp3
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). USA speaker Speech Dataset in English [Dataset]. https://data.macgence.com/dataset/usa-speaker-speech-dataset-in-english
Explore at:
mp3Available download formats
Dataset updated
Mar 30, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
United States, Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes general conversations, featuring English speakers from USA with detailed metadata.
h
wolof-audio-data
huggingface.co
Updated Dec 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdoulaye Diallo (2024). wolof-audio-data [Dataset]. https://huggingface.co/datasets/vonewman/wolof-audio-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2024
Authors
Abdoulaye Diallo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Wolof Audio Dataset

The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:

ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.
d
Bulgarian audio dataset for speech recognition 20 hours (1/4)
datarade.ai
.json
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StageZero (2024). Bulgarian audio dataset for speech recognition 20 hours (1/4) [Dataset]. https://datarade.ai/data-products/bulgarian-audio-dataset-for-speech-recognition-20-hours-1-4-stagezero
Explore at:
.jsonAvailable download formats
Dataset updated
Jul 11, 2024
Dataset authored and provided by
StageZero
Area covered
Bulgaria
Description
Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.

Facebook

Twitter

Click to copy link

Link copied

Cite

StageZero (2024). Norwegian audio dataset for speech recognition 20 hours (5/5) [Dataset]. https://datarade.ai/data-products/norwegian-audio-dataset-for-speech-recognition-20-hours-5-5-stagezero

Norwegian audio dataset for speech recognition 20 hours (5/5)

Explore at:

.jsonAvailable download formats

Dataset updated

Jul 24, 2024

Dataset authored and provided by

StageZero

Area covered

Norway

Description

Specifications: - Each user has a unique ID across the entire dataset. - Maximum four hours of speech per person in the dataset. - Speech is recorded and transcribed on separate tracks. - High-quality transcriptions come with the data in JSON format. - No noise and high-quality recordings with both male and female speakers. - Metadata includes: gender, age, and location. - License terms: you pay once and you can use the data commercially in your products, but you cannot resell the data.

Clear search

Close search

Google apps

Main menu

Norwegian audio dataset for speech recognition 20 hours (5/5)

Speech & Audio Datasets

TORGO Dataset for Dysarthric Speech - Audio Files

Lithuanian audio dataset for speech recognition 20 hours (4/5)

English (UK) General Conversation Speech Dataset

What’s Included

Japanese General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Bengali Speech Recognition Dataset (BSRD)

General conversation speech datasets in Shona for School

medical_asr_recording_dataset

Japanese Speech Recognition Dataset

alphanumeric-audio-dataset

Bangla Real Number Audio Dataset

Total Size: 32.4MB

Arabic Speech Commands Dataset

Zambezi Voice Dataset

common_voice_12_0

African speaker Speech Dataset in Igbo

UAE speaker Speech Dataset in Arabic

USA speaker Speech Dataset in English

wolof-audio-data

Bulgarian audio dataset for speech recognition 20 hours (1/4)

Norwegian audio dataset for speech recognition 20 hours (5/5)See More Versions

Norwegian audio dataset for speech recognition 20 hours (5/5)