9 datasets found
  1. P

    IndicTTS Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). IndicTTS Dataset [Dataset]. https://paperswithcode.com/dataset/indictts
    Explore at:
    Dataset updated
    Oct 15, 2016
    Description

    A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.

  2. h

    indic_tts_ml

    • huggingface.co
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thennal (2023). indic_tts_ml [Dataset]. https://huggingface.co/datasets/thennal/indic_tts_ml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Authors
    Thennal
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Indic TTS Malayalam Speech Corpus

    The Malayalam subset of Indic TTS Corpus, taken from this Kaggle database. The corpus contains one male and one female speaker, with a 2:1 ratio of samples due to missing files for the female speaker. The license is given in the repository.

  3. h

    roots_indic-ta_wikiquote

    • huggingface.co
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Data (2023). roots_indic-ta_wikiquote [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-ta_wikiquote
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-ta_wikiquote

      wikiquote_filtered
    

    Dataset uid: wikiquote_filtered

      Description
    
    
    
    
    
    
    
      Homepage
    
    
    
    
    
    
    
      Licensing
    
    
    
    
    
    
    
      Speaker Locations
    
    
    
    
    
    
    
      Sizes
    

    0.0462 % of total 0.1697 % of en 0.0326 % of fr 0.0216 % of ar 0.0066 % of zh 0.0833 % of pt 0.0357 % of es 0.0783 % of indic-ta 0.0361 % of indic-hi 0.0518 % of ca 0.0405 % of vi 0.0834 % of indic-ml 0.0542 % of indic-te 0.1172 % of indic-gu 0.0634 % of… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-ta_wikiquote.

  4. h

    roots_indic-te_wikipedia

    • huggingface.co
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    roots_indic-te_wikipedia [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-te_wikipedia
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-te_wikipedia

      wikipedia
    

    Dataset uid: wikipedia

      Description
    
    
    
    
    
    
    
      Homepage
    
    
    
    
    
    
    
      Licensing
    
    
    
    
    
    
    
      Speaker Locations
    
    
    
    
    
    
    
      Sizes
    

    3.2299 % of total 4.2071 % of en 5.6773 % of ar 3.3416 % of fr 5.2815 % of es 12.4852 % of ca 0.4288 % of zh 0.4286 % of zh 5.4743 % of indic-bn 8.9062 % of indic-ta 21.3313 % of indic-te 4.4845 % of pt 4.0493 % of indic-hi 11.3163 % of indic-ml 22.5300 % of indic-ur 4.4902 %… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-te_wikipedia.

  5. h

    roots_indic-pa_wikibooks

    • huggingface.co
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Data (2023). roots_indic-pa_wikibooks [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-pa_wikibooks
    Explore at:
    Dataset updated
    Jul 24, 2023
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-pa_wikibooks

      wikibooks_filtered
    

    Dataset uid: wikibooks_filtered

      Description
    
    
    
    
    
    
    
      Homepage
    
    
    
    
    
    
    
      Licensing
    
    
    
    
    
    
    
      Speaker Locations
    
    
    
    
    
    
    
      Sizes
    

    0.0897 % of total 0.2591 % of en 0.0965 % of fr 0.1691 % of es 0.2834 % of indic-hi 0.2172 % of pt 0.0149 % of zh 0.0279 % of ar 0.1374 % of vi 0.5025 % of id 0.3694 % of indic-ur 0.5744 % of eu 0.0769 % of ca 0.0519 % of indic-ta 0.1470 % of indic-mr 0.0751 %… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-pa_wikibooks.

  6. h

    roots_indic-mr_mkb

    • huggingface.co
    Updated Nov 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Data (2023). roots_indic-mr_mkb [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-mr_mkb
    Explore at:
    Dataset updated
    Nov 2, 2023
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-mr_mkb

      mkb
    

    Dataset uid: mkb

      Description
    

    The Prime Ministers speeches - Mann Ki Baat, on All India Radio, translated into many languages.

      Homepage
    

    https://huggingface.co/datasets/mkb http://preon.iiit.ac.in/~jerin/bhasha/

      Licensing
    
    
    
    
    
    
    
      Speaker Locations
    
    
    
    
    
    
    
      Sizes
    

    0.0009 % of total 0.0174 % of indic-ta 0.0252 % of indic-ml 0.0416 % of indic-mr 0.0601 % of indic-gu 0.0047 % of indic-bn… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-mr_mkb.

  7. h

    IndicTTS_Telugu

    • huggingface.co
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPRINGLab (2025). IndicTTS_Telugu [Dataset]. https://huggingface.co/datasets/SPRINGLab/IndicTTS_Telugu
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    SPRINGLab
    Description

    Telugu Indic TTS Dataset

    This dataset is derived from the Indic TTS Database project, specifically using the Telugu monolingual recordings from both male and female speakers. The dataset contains high-quality speech recordings with corresponding text transcriptions, making it suitable for text-to-speech (TTS) research and development.

      Dataset Details
    

    Language: Telugu Total Duration: ~8.74 hours (Male: 4.47 hours, Female: 4.27 hours) Audio Format: WAV Sampling Rate:… See the full description on the dataset page: https://huggingface.co/datasets/SPRINGLab/IndicTTS_Telugu.

  8. h

    IndicTTS_Manipuri

    • huggingface.co
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPRINGLab (2025). IndicTTS_Manipuri [Dataset]. https://huggingface.co/datasets/SPRINGLab/IndicTTS_Manipuri
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    SPRINGLab
    Area covered
    মনিপুর
    Description

    Manipuri Indic TTS Dataset

    This dataset is derived from the Indic TTS Database project, specifically using the Manipuri monolingual recordings from both male and female speakers. The dataset contains high-quality speech recordings with corresponding text transcriptions, making it suitable for text-to-speech (TTS) research and development.

      Dataset Details
    

    Language: Manipuri Total Duration: ~20.75 hours (Male: 10.61 hours, Female: 10.14 hours) Audio Format: WAV Sampling… See the full description on the dataset page: https://huggingface.co/datasets/SPRINGLab/IndicTTS_Manipuri.

  9. h

    roots_indic-hi_wikiversity

    • huggingface.co
    Updated Sep 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Data (2022). roots_indic-hi_wikiversity [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-hi_wikiversity
    Explore at:
    Dataset updated
    Sep 19, 2022
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-hi_wikiversity

      wikiversity_filtered
    

    Dataset uid: wikiversity_filtered

      Description
    
    
    
    
    
    
    
      Homepage
    
    
    
    
    
    
    
      Licensing
    
    
    
    
    
    
    
      Speaker Locations
    
    
    
    
    
    
    
      Sizes
    

    0.0367 % of total 0.1050 % of en 0.1178 % of fr 0.1231 % of pt 0.0072 % of zh 0.0393 % of es 0.0076 % of ar 0.0069 % of indic-hi

      BigScience processing steps
    
    
    
    
    
    
    
      Filters applied to: en
    

    filter_wiki_user_titles… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-hi_wikiversity.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2016). IndicTTS Dataset [Dataset]. https://paperswithcode.com/dataset/indictts

IndicTTS Dataset

Explore at:
82 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 15, 2016
Description

A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.

Search
Clear search
Close search
Google apps
Main menu