3 datasets found

h
belebele
huggingface.co
Updated Sep 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2023). belebele [Dataset]. https://huggingface.co/datasets/facebook/belebele
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2023
Dataset authored and provided by
AI at Meta
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Belebele Benchmark for Massively Multilingual NLU Evaluation

Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the FLORES-200 dataset. The human annotation procedure was carefully curated to create questions that… See the full description on the dataset page: https://huggingface.co/datasets/facebook/belebele.
P
Data from: GLUE-X Dataset
library.toponeai.link
paperswithcode.com
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linyi Yang; Shuibai Zhang; Libo Qin; Yafu Li; Yidong Wang; Hanmeng Liu; Jindong Wang; Xing Xie; Yue Zhang (2024). GLUE-X Dataset [Dataset]. https://library.toponeai.link/dataset/glue-x
Explore at:
Dataset updated
Mar 10, 2024
Authors
Linyi Yang; Shuibai Zhang; Libo Qin; Yafu Li; Yidong Wang; Hanmeng Liu; Jindong Wang; Xing Xie; Yue Zhang
Description
GLUE-X is a benchmark dataset used to evaluate the out-of-distribution (OOD) robustness of Natural Language Understanding (NLU) models. It was created to address the OOD generalization problem, which remains a challenge in many NLP tasks and limits the real-world deployment of these methods. The GLUE-X dataset consists of 14 publicly available datasets used as OOD test data. Evaluations are conducted on 8 classic NLP tasks over popularly used models. The findings from these evaluations highlight the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy. The creators of GLUE-X hope that this dataset will help highlight the importance of OOD robustness and provide insights on how to measure the robustness of a model and how to improve it.
XGLUE
opendatalab.com
huggingface.co
zip
Updated Aug 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2022). XGLUE [Dataset]. https://opendatalab.com/OpenDataLab/XGLUE
Explore at:
zip(2521513333 bytes)Available download formats
Dataset updated
Aug 29, 2022
Dataset provided by
微軟http://microsoft.com/
License
https://microsoft.github.io/XGLUE/https://microsoft.github.io/XGLUE/
Description
"XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation.

For people who are working on XTREME, we recommend you to submit results to XGLUE as well, especially for the XGLUE unique tasks, as (1) XGLUE includes tasks (News Classification, Query-Ad Matching, Web Page Ranking and QA Matching) from real-world scenarios, and all these datasets are generated and labeled by human, instead of MT engines; (2) XGLUE provides two cross-lingual text generation tasks.

XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This means that to succeed at XGLUE, a model must have a strong zero-shot cross-lingual transfer capability to learn from the English data of a specific task and transfer what it learned to other languages. Comparing to its concurrent work XTREME, XGLUE has two characteristics: First, it includes cross-lingual NLU and cross-lingual NLG tasks at the same time; Second, besides including 5 existing cross-lingual tasks (i.e. NER, POS, MLQA, PAWS-X and XNLI), XGLUE selects 6 new tasks from Bing scenarios as well, including News Classification (NC), Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM), Question Generation (QG) and News Title Generation (NTG). Such diversities of languages, tasks and task origin provide a comprehensive benchmark for quantifying the quality of a pre-trained model on cross-lingual natural language understanding and generation."
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

AI at Meta (2023). belebele [Dataset]. https://huggingface.co/datasets/facebook/belebele

belebele

Belebele

facebook/belebele

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 9, 2023

Dataset authored and provided by

AI at Meta

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The Belebele Benchmark for Massively Multilingual NLU Evaluation

Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages. Each question has four multiple-choice answers and is linked to a short passage from the FLORES-200 dataset. The human annotation procedure was carefully curated to create questions that… See the full description on the dataset page: https://huggingface.co/datasets/facebook/belebele.

Clear search

Close search

Google apps

Main menu

belebele

Data from: GLUE-X Dataset

XGLUE

belebele

Belebele

facebook/belebele