13 datasets found

i
Data from: MASC: Massive Arabic Speech Corpus
ieee-dataport.org
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Apfetyani (2025). MASC: Massive Arabic Speech Corpus [Dataset]. https://ieee-dataport.org/open-access/masc-massive-arabic-speech-corpus
Explore at:
Dataset updated
Apr 28, 2025
Authors
Mohammad Apfetyani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
multi-genre
h
masc_dev
huggingface.co
Updated Apr 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulwahab Sahyoun (2022). masc_dev [Dataset]. https://huggingface.co/datasets/abdusah/masc_dev
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2022
Authors
Abdulwahab Sahyoun
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for MASC: MASSIVE ARABIC SPEECH CORPUS

Dataset Summary

This corpus is a dataset that contains 1,000 hours of speech sampled at 16~kHz and crawled from over 700 YouTube channels. MASC is multi-regional, multi-genre, and multi-dialect dataset that is intended to advance the research and development of Arabic speech technology with the special emphasis on Arabic speech recognition

Supported Tasks and Leaderboards

[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/abdusah/masc_dev.
h
MASC
huggingface.co
Updated May 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hirundo (2025). MASC [Dataset]. https://huggingface.co/datasets/hirundo-io/MASC
Explore at:
Dataset updated
May 6, 2025
Dataset authored and provided by
Hirundo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and… See the full description on the dataset page: https://huggingface.co/datasets/hirundo-io/MASC.
h
levantine_test_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). levantine_test_set [Dataset]. https://huggingface.co/datasets/otozz/levantine_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Levantine test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
gulf_test_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). gulf_test_set [Dataset]. https://huggingface.co/datasets/otozz/gulf_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Gulf test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
iraqi_test_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). iraqi_test_set [Dataset]. https://huggingface.co/datasets/otozz/iraqi_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Iraqi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
egyptian_test_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). egyptian_test_set [Dataset]. https://huggingface.co/datasets/otozz/egyptian_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Egyptian test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
maghrebi_test_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). maghrebi_test_set [Dataset]. https://huggingface.co/datasets/otozz/maghrebi_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Maghrebi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
gulf_train_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). gulf_train_set [Dataset]. https://huggingface.co/datasets/otozz/gulf_train_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Gulf train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
egyptian_train_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). egyptian_train_set [Dataset]. https://huggingface.co/datasets/otozz/egyptian_train_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Egyptian train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
maghrebi_train_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). maghrebi_train_set [Dataset]. https://huggingface.co/datasets/otozz/maghrebi_train_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Maghrebi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
iraqi_train_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). iraqi_train_set [Dataset]. https://huggingface.co/datasets/otozz/iraqi_train_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Iraqi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
h
levantine_train_set
huggingface.co
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ömer (2025). levantine_train_set [Dataset]. https://huggingface.co/datasets/otozz/levantine_train_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2025
Authors
Ömer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pre-processed Levantine train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Not seeing a result you expected?
Learn how you can add new datasets to our index.