13 datasets found
  1. i

    Data from: MASC: Massive Arabic Speech Corpus

    • ieee-dataport.org
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Apfetyani (2025). MASC: Massive Arabic Speech Corpus [Dataset]. https://ieee-dataport.org/open-access/masc-massive-arabic-speech-corpus
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Mohammad Apfetyani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    multi-genre

  2. h

    masc_dev

    • huggingface.co
    Updated Apr 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulwahab Sahyoun (2022). masc_dev [Dataset]. https://huggingface.co/datasets/abdusah/masc_dev
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2022
    Authors
    Abdulwahab Sahyoun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for MASC: MASSIVE ARABIC SPEECH CORPUS

      Dataset Summary
    

    This corpus is a dataset that contains 1,000 hours of speech sampled at 16~kHz and crawled from over 700 YouTube channels. MASC is multi-regional, multi-genre, and multi-dialect dataset that is intended to advance the research and development of Arabic speech technology with the special emphasis on Arabic speech recognition

      Supported Tasks and Leaderboards
    

    [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/abdusah/masc_dev.

  3. h

    MASC

    • huggingface.co
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hirundo (2025). MASC [Dataset]. https://huggingface.co/datasets/hirundo-io/MASC
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Hirundo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and… See the full description on the dataset page: https://huggingface.co/datasets/hirundo-io/MASC.

  4. h

    levantine_test_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). levantine_test_set [Dataset]. https://huggingface.co/datasets/otozz/levantine_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Levantine test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  5. h

    gulf_test_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). gulf_test_set [Dataset]. https://huggingface.co/datasets/otozz/gulf_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Gulf test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  6. h

    iraqi_test_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). iraqi_test_set [Dataset]. https://huggingface.co/datasets/otozz/iraqi_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Iraqi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  7. h

    egyptian_test_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). egyptian_test_set [Dataset]. https://huggingface.co/datasets/otozz/egyptian_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Egyptian test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  8. h

    maghrebi_test_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). maghrebi_test_set [Dataset]. https://huggingface.co/datasets/otozz/maghrebi_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Maghrebi test partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  9. h

    gulf_train_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). gulf_train_set [Dataset]. https://huggingface.co/datasets/otozz/gulf_train_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Gulf train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  10. h

    egyptian_train_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). egyptian_train_set [Dataset]. https://huggingface.co/datasets/otozz/egyptian_train_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Egyptian train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  11. h

    maghrebi_train_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). maghrebi_train_set [Dataset]. https://huggingface.co/datasets/otozz/maghrebi_train_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Maghrebi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  12. h

    iraqi_train_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). iraqi_train_set [Dataset]. https://huggingface.co/datasets/otozz/iraqi_train_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Iraqi train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  13. h

    levantine_train_set

    • huggingface.co
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömer (2025). levantine_train_set [Dataset]. https://huggingface.co/datasets/otozz/levantine_train_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Authors
    Ömer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pre-processed Levantine train partition from the MASC-dataset: Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith Abandah, Adham Alsharkawi, Maha Dawas, August 18, 2021, "MASC: Massive Arabic Speech Corpus", IEEE Dataport, doi: https://dx.doi.org/10.21227/e1qb-jv46.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohammad Apfetyani (2025). MASC: Massive Arabic Speech Corpus [Dataset]. https://ieee-dataport.org/open-access/masc-massive-arabic-speech-corpus

Data from: MASC: Massive Arabic Speech Corpus

Related Article
Explore at:
Dataset updated
Apr 28, 2025
Authors
Mohammad Apfetyani
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

multi-genre

Search
Clear search
Close search
Google apps
Main menu