Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
multi-genre
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for MASC: MASSIVE ARABIC SPEECH CORPUS
Dataset Summary
This corpus is a dataset that contains 1,000 hours of speech sampled at 16~kHz and crawled from over 700 YouTube channels. MASC is multi-regional, multi-genre, and multi-dialect dataset that is intended to advance the research and development of Arabic speech technology with the special emphasis on Arabic speech recognition
Supported Tasks and Leaderboards
[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/abdusah/masc_dev.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and… See the full description on the dataset page: https://huggingface.co/datasets/hirundo-io/MASC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Levantine test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Gulf test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Iraqi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Egyptian test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Maghrebi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Gulf train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Egyptian train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Maghrebi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Iraqi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed Levantine train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
multi-genre