Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Belebele Benchmark for Massively Multilingual NLU Evaluation
Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the FLORES-200 dataset. The human annotation procedure was carefully curated to create questions that… See the full description on the dataset page: https://huggingface.co/datasets/facebook/belebele.
GLUE-X is a benchmark dataset used to evaluate the out-of-distribution (OOD) robustness of Natural Language Understanding (NLU) models. It was created to address the OOD generalization problem, which remains a challenge in many NLP tasks and limits the real-world deployment of these methods. The GLUE-X dataset consists of 14 publicly available datasets used as OOD test data. Evaluations are conducted on 8 classic NLP tasks over popularly used models. The findings from these evaluations highlight the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy. The creators of GLUE-X hope that this dataset will help highlight the importance of OOD robustness and provide insights on how to measure the robustness of a model and how to improve it.
https://microsoft.github.io/XGLUE/https://microsoft.github.io/XGLUE/
"XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation.
For people who are working on XTREME, we recommend you to submit results to XGLUE as well, especially for the XGLUE unique tasks, as (1) XGLUE includes tasks (News Classification, Query-Ad Matching, Web Page Ranking and QA Matching) from real-world scenarios, and all these datasets are generated and labeled by human, instead of MT engines; (2) XGLUE provides two cross-lingual text generation tasks.
XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This means that to succeed at XGLUE, a model must have a strong zero-shot cross-lingual transfer capability to learn from the English data of a specific task and transfer what it learned to other languages. Comparing to its concurrent work XTREME, XGLUE has two characteristics: First, it includes cross-lingual NLU and cross-lingual NLG tasks at the same time; Second, besides including 5 existing cross-lingual tasks (i.e. NER, POS, MLQA, PAWS-X and XNLI), XGLUE selects 6 new tasks from Bing scenarios as well, including News Classification (NC), Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM), Question Generation (QG) and News Title Generation (NTG). Such diversities of languages, tasks and task origin provide a comprehensive benchmark for quantifying the quality of a pre-trained model on cross-lingual natural language understanding and generation."
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Belebele Benchmark for Massively Multilingual NLU Evaluation
Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the FLORES-200 dataset. The human annotation procedure was carefully curated to create questions that… See the full description on the dataset page: https://huggingface.co/datasets/facebook/belebele.