Facebook
Twitterhttps://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
Ukrainian Formality Dataset (translated)
We obtained the first of its kind Ukrainian Formality Classification dataset by trainslating English GYAFC data.
Dataset formation:
English data source: https://aclanthology.org/N18-1012/ Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M Additionally, the dataset was balanced.
Labels: 0 - informal, 1 - formal.
Load dataset:
from datasets import load_dataset… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc.
Facebook
Twitterhttps://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
Ukrainian Formality (seminatural)
We obtained formality classification data for Ukrainian from various sources: Ukrainian legal acts, fiction, news. The data can be used to tune and test Ukrainian formality classification models closer to real-life scenarious! Labels information: 0 - informal, 1 - formal.
Citation
@inproceedings{dementieva-etal-2025-cross, title = "Cross-lingual Text Classification Transfer: The Case of {U}krainian", author = "Dementieva, Daryna… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-seminatural.
Facebook
Twitterhttps://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
UkrFormalityClassification An MTEB dataset Massive Text Embedding Benchmark
This dataset contains Ukrainian Formality Classification dataset obtained by
trainslating English GYAFC data.
English data source: https://aclanthology.org/N18-1012/
Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M
Additionally, the dataset was balanced, witha labels: 0 - informal, 1 - formal.
Task category t2c… See the full description on the dataset page: https://huggingface.co/datasets/mteb/UkrFormalityClassification.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
Ukrainian Formality Dataset (translated)
We obtained the first of its kind Ukrainian Formality Classification dataset by trainslating English GYAFC data.
Dataset formation:
English data source: https://aclanthology.org/N18-1012/ Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M Additionally, the dataset was balanced.
Labels: 0 - informal, 1 - formal.
Load dataset:
from datasets import load_dataset… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc.