4 datasets found
  1. h

    ClArTTS

    • huggingface.co
    Updated Jul 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Bin Zayed University of Artificial Intelligence (2025). ClArTTS [Dataset]. https://huggingface.co/datasets/MBZUAI/ClArTTS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Mohamed Bin Zayed University of Artificial Intelligence
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Summary

    We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.

      Dataset Structure
    

    A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.

  2. h

    arabic-tts-wav-24k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharjeel Abid Butt, arabic-tts-wav-24k [Dataset]. https://huggingface.co/datasets/NeoBoy/arabic-tts-wav-24k
    Explore at:
    Authors
    Sharjeel Abid Butt
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Arabic TTS WAV 24k Dataset

    A high-quality, open-source dataset for Arabic Text-to-Speech (TTS) research, containing paired audio and text samples from both male and female speakers. All audio is provided in 24kHz WAV format, with rich metadata and phonetic transcriptions.

      Dataset Summary
    

    This dataset is designed for training and evaluating neural TTS systems in Modern Standard Arabic. It includes:

    Audio: Clean, studio-quality WAV files at 24,000 Hz. Text: Original Arabic… See the full description on the dataset page: https://huggingface.co/datasets/NeoBoy/arabic-tts-wav-24k.

  3. h

    Arabic-Text-to-Speech

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Atef, Arabic-Text-to-Speech [Dataset]. https://huggingface.co/datasets/andrewatef/Arabic-Text-to-Speech
    Explore at:
    Authors
    Andrew Atef
    Description

    andrewatef/Arabic-Text-to-Speech dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    Ar-ASR

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cairo University AI Students (2025). Ar-ASR [Dataset]. https://huggingface.co/datasets/CUAIStudents/Ar-ASR
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Cairo University AI Students
    Description

    Ar-ASR

      Dataset Description
    

    This dataset is designed for Automatic Speech Recognition (ASR), focusing on Arabic speech with precise transcriptions including tashkeel (diacritics). It contains 33,607 audio samples from multiple sources: Microsoft Edge TTS API, Common Voice (validated Arabic subset), individual contributions, and manually transcribed YouTube videos (we also added the dataset ClArTTS). The dataset is paired with aligned Arabic text transcriptions and is… See the full description on the dataset page: https://huggingface.co/datasets/CUAIStudents/Ar-ASR.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamed Bin Zayed University of Artificial Intelligence (2025). ClArTTS [Dataset]. https://huggingface.co/datasets/MBZUAI/ClArTTS

ClArTTS

ClArTTS

MBZUAI/ClArTTS

Explore at:
18 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Mohamed Bin Zayed University of Artificial Intelligence
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Summary

We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.

  Dataset Structure

A typical data point comprises the name of the audio file, called… See the full description on the dataset page: https://huggingface.co/datasets/MBZUAI/ClArTTS.

Search
Clear search
Close search
Google apps
Main menu