9 datasets found

P
IndicTTS Dataset
paperswithcode.com
opendatalab.com
Updated Oct 15, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). IndicTTS Dataset [Dataset]. https://paperswithcode.com/dataset/indictts
Explore at:
Dataset updated
Oct 15, 2016
Description
A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.
h
indic_tts_ml
huggingface.co
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thennal (2023). indic_tts_ml [Dataset]. https://huggingface.co/datasets/thennal/indic_tts_ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2023
Authors
Thennal
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Indic TTS Malayalam Speech Corpus

The Malayalam subset of Indic TTS Corpus, taken from this Kaggle database. The corpus contains one male and one female speaker, with a 2:1 ratio of samples due to missing files for the female speaker. The license is given in the repository.
h
roots_indic-ta_wikiquote
huggingface.co
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Data (2023). roots_indic-ta_wikiquote [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-ta_wikiquote
Explore at:
Dataset updated
Aug 9, 2023
Dataset authored and provided by
BigScience Data
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
ROOTS Subset: roots_indic-ta_wikiquote

wikiquote_filtered

Dataset uid: wikiquote_filtered

Description Homepage Licensing Speaker Locations Sizes

0.0462 % of total 0.1697 % of en 0.0326 % of fr 0.0216 % of ar 0.0066 % of zh 0.0833 % of pt 0.0357 % of es 0.0783 % of indic-ta 0.0361 % of indic-hi 0.0518 % of ca 0.0405 % of vi 0.0834 % of indic-ml 0.0542 % of indic-te 0.1172 % of indic-gu 0.0634 % of… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-ta_wikiquote.
h
roots_indic-te_wikipedia
huggingface.co
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
roots_indic-te_wikipedia [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-te_wikipedia
Explore at:
Dataset updated
Aug 9, 2023
Dataset authored and provided by
BigScience Data
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
ROOTS Subset: roots_indic-te_wikipedia

wikipedia

Dataset uid: wikipedia

Description Homepage Licensing Speaker Locations Sizes

3.2299 % of total 4.2071 % of en 5.6773 % of ar 3.3416 % of fr 5.2815 % of es 12.4852 % of ca 0.4288 % of zh 0.4286 % of zh 5.4743 % of indic-bn 8.9062 % of indic-ta 21.3313 % of indic-te 4.4845 % of pt 4.0493 % of indic-hi 11.3163 % of indic-ml 22.5300 % of indic-ur 4.4902 %… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-te_wikipedia.
h
roots_indic-pa_wikibooks
huggingface.co
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Data (2023). roots_indic-pa_wikibooks [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-pa_wikibooks
Explore at:
Dataset updated
Jul 24, 2023
Dataset authored and provided by
BigScience Data
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
ROOTS Subset: roots_indic-pa_wikibooks

wikibooks_filtered

Dataset uid: wikibooks_filtered

Description Homepage Licensing Speaker Locations Sizes

0.0897 % of total 0.2591 % of en 0.0965 % of fr 0.1691 % of es 0.2834 % of indic-hi 0.2172 % of pt 0.0149 % of zh 0.0279 % of ar 0.1374 % of vi 0.5025 % of id 0.3694 % of indic-ur 0.5744 % of eu 0.0769 % of ca 0.0519 % of indic-ta 0.1470 % of indic-mr 0.0751 %… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-pa_wikibooks.
h
roots_indic-mr_mkb
huggingface.co
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Data (2023). roots_indic-mr_mkb [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-mr_mkb
Explore at:
Dataset updated
Nov 2, 2023
Dataset authored and provided by
BigScience Data
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ROOTS Subset: roots_indic-mr_mkb

mkb

Dataset uid: mkb

Description

The Prime Ministers speeches - Mann Ki Baat, on All India Radio, translated into many languages.

Homepage

https://huggingface.co/datasets/mkb http://preon.iiit.ac.in/~jerin/bhasha/

Licensing Speaker Locations Sizes

0.0009 % of total 0.0174 % of indic-ta 0.0252 % of indic-ml 0.0416 % of indic-mr 0.0601 % of indic-gu 0.0047 % of indic-bn… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-mr_mkb.
h
IndicTTS_Telugu
huggingface.co
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SPRINGLab (2025). IndicTTS_Telugu [Dataset]. https://huggingface.co/datasets/SPRINGLab/IndicTTS_Telugu
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2025
Dataset authored and provided by
SPRINGLab
Description
Telugu Indic TTS Dataset

This dataset is derived from the Indic TTS Database project, specifically using the Telugu monolingual recordings from both male and female speakers. The dataset contains high-quality speech recordings with corresponding text transcriptions, making it suitable for text-to-speech (TTS) research and development.

Dataset Details

Language: Telugu Total Duration: ~8.74 hours (Male: 4.47 hours, Female: 4.27 hours) Audio Format: WAV Sampling Rate:… See the full description on the dataset page: https://huggingface.co/datasets/SPRINGLab/IndicTTS_Telugu.
h
IndicTTS_Manipuri
huggingface.co
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SPRINGLab (2025). IndicTTS_Manipuri [Dataset]. https://huggingface.co/datasets/SPRINGLab/IndicTTS_Manipuri
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2025
Dataset authored and provided by
SPRINGLab
Area covered
মনিপুর
Description
Manipuri Indic TTS Dataset

This dataset is derived from the Indic TTS Database project, specifically using the Manipuri monolingual recordings from both male and female speakers. The dataset contains high-quality speech recordings with corresponding text transcriptions, making it suitable for text-to-speech (TTS) research and development.

Dataset Details

Language: Manipuri Total Duration: ~20.75 hours (Male: 10.61 hours, Female: 10.14 hours) Audio Format: WAV Sampling… See the full description on the dataset page: https://huggingface.co/datasets/SPRINGLab/IndicTTS_Manipuri.
h
roots_indic-hi_wikiversity
huggingface.co
Updated Sep 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Data (2022). roots_indic-hi_wikiversity [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-hi_wikiversity
Explore at:
Dataset updated
Sep 19, 2022
Dataset authored and provided by
BigScience Data
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
ROOTS Subset: roots_indic-hi_wikiversity

wikiversity_filtered

Dataset uid: wikiversity_filtered

Description Homepage Licensing Speaker Locations Sizes

0.0367 % of total 0.1050 % of en 0.1178 % of fr 0.1231 % of pt 0.0072 % of zh 0.0393 % of es 0.0076 % of ar 0.0069 % of indic-hi

BigScience processing steps Filters applied to: en

filter_wiki_user_titles… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-hi_wikiversity.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2016). IndicTTS Dataset [Dataset]. https://paperswithcode.com/dataset/indictts

IndicTTS Dataset

Explore at:

82 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 15, 2016

Description

A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.

Clear search

Close search

Google apps

Main menu

IndicTTS Dataset

indic_tts_ml

roots_indic-ta_wikiquote

roots_indic-te_wikipedia

roots_indic-pa_wikibooks

roots_indic-mr_mkb

IndicTTS_Telugu

IndicTTS_Manipuri

roots_indic-hi_wikiversity

IndicTTS Dataset