4 datasets found

h
ClArTTS
huggingface.co
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Bin Zayed University of Artificial Intelligence (2024). ClArTTS [Dataset]. https://huggingface.co/datasets/MBZUAI/ClArTTS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2024
Dataset authored and provided by
Mohamed Bin Zayed University of Artificial Intelligence
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Summary

We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.

Dataset Structure

A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.
h
arabic-tts-wav-24k
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharjeel Abid Butt, arabic-tts-wav-24k [Dataset]. https://huggingface.co/datasets/NeoBoy/arabic-tts-wav-24k
Explore at:
Authors
Sharjeel Abid Butt
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Arabic TTS WAV 24k Dataset

A high-quality, open-source dataset for Arabic Text-to-Speech (TTS) research, containing paired audio and text samples from both male and female speakers. All audio is provided in 24kHz WAV format, with rich metadata and phonetic transcriptions.

Dataset Summary

This dataset is designed for training and evaluating neural TTS systems in Modern Standard Arabic. It includes:

Audio: Clean, studio-quality WAV files at 24,000 Hz. Text: Original Arabic… See the full description on the dataset page: https://huggingface.co/datasets/NeoBoy/arabic-tts-wav-24k.
h
Arabic-Text-to-Speech
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Atef, Arabic-Text-to-Speech [Dataset]. https://huggingface.co/datasets/andrewatef/Arabic-Text-to-Speech
Explore at:
Authors
Andrew Atef
Description
andrewatef/Arabic-Text-to-Speech dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Ar-ASR
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cairo University AI Students (2025). Ar-ASR [Dataset]. https://huggingface.co/datasets/CUAIStudents/Ar-ASR
Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Cairo University AI Students
Description
Ar-ASR

Dataset Description

This dataset is designed for Automatic Speech Recognition (ASR), focusing on Arabic speech with precise transcriptions including tashkeel (diacritics). It contains 33,607 audio samples from multiple sources: Microsoft Edge TTS API, Common Voice (validated Arabic subset), individual contributions, and manually transcribed YouTube videos (we also added the dataset ClArTTS). The dataset is paired with aligned Arabic text transcriptions and is… See the full description on the dataset page: https://huggingface.co/datasets/CUAIStudents/Ar-ASR.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamed Bin Zayed University of Artificial Intelligence (2024). ClArTTS [Dataset]. https://huggingface.co/datasets/MBZUAI/ClArTTS

ClArTTS

MBZUAI/ClArTTS

Explore at:

18 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 18, 2024

Dataset authored and provided by

Mohamed Bin Zayed University of Artificial Intelligence

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Summary

We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.

  Dataset Structure

A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.

Clear search

Close search

Google apps

Main menu

ClArTTS

arabic-tts-wav-24k

Arabic-Text-to-Speech

Ar-ASR

ClArTTSSee More Versions

ClArTTS

MBZUAI/ClArTTS

ClArTTS