Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Summary
We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.
Dataset Structure
A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Arabic TTS WAV 24k Dataset
A high-quality, open-source dataset for Arabic Text-to-Speech (TTS) research, containing paired audio and text samples from both male and female speakers. All audio is provided in 24kHz WAV format, with rich metadata and phonetic transcriptions.
Dataset Summary
This dataset is designed for training and evaluating neural TTS systems in Modern Standard Arabic. It includes:
Audio: Clean, studio-quality WAV files at 24,000 Hz. Text: Original Arabic… See the full description on the dataset page: https://huggingface.co/datasets/NeoBoy/arabic-tts-wav-24k.
andrewatef/Arabic-Text-to-Speech dataset hosted on Hugging Face and contributed by the HF Datasets community
Ar-ASR
Dataset Description
This dataset is designed for Automatic Speech Recognition (ASR), focusing on Arabic speech with precise transcriptions including tashkeel (diacritics). It contains 33,607 audio samples from multiple sources: Microsoft Edge TTS API, Common Voice (validated Arabic subset), individual contributions, and manually transcribed YouTube videos (we also added the dataset ClArTTS). The dataset is paired with aligned Arabic text transcriptions and is… See the full description on the dataset page: https://huggingface.co/datasets/CUAIStudents/Ar-ASR.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Summary
We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.
Dataset Structure
A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.