3 datasets found
  1. h

    ukr-formality-dataset-translated-gyafc

    • huggingface.co
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ukrainian Texts Classification (2025). ukr-formality-dataset-translated-gyafc [Dataset]. https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Ukrainian Texts Classification
    License

    https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/

    Description

    Ukrainian Formality Dataset (translated)

    We obtained the first of its kind Ukrainian Formality Classification dataset by trainslating English GYAFC data.

      Dataset formation:
    

    English data source: https://aclanthology.org/N18-1012/ Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M Additionally, the dataset was balanced.

    Labels: 0 - informal, 1 - formal.

      Load dataset:
    

    from datasets import load_dataset… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc.

  2. h

    ukr-formality-dataset-seminatural

    • huggingface.co
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ukrainian Texts Classification (2025). ukr-formality-dataset-seminatural [Dataset]. https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-seminatural
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Ukrainian Texts Classification
    License

    https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/

    Description

    Ukrainian Formality (seminatural)

    We obtained formality classification data for Ukrainian from various sources: Ukrainian legal acts, fiction, news. The data can be used to tune and test Ukrainian formality classification models closer to real-life scenarious! Labels information: 0 - informal, 1 - formal.

      Citation
    

    @inproceedings{dementieva-etal-2025-cross, title = "Cross-lingual Text Classification Transfer: The Case of {U}krainian", author = "Dementieva, Daryna… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-seminatural.

  3. h

    UkrFormalityClassification

    • huggingface.co
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). UkrFormalityClassification [Dataset]. https://huggingface.co/datasets/mteb/UkrFormalityClassification
    Explore at:
    Dataset updated
    Feb 19, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/

    Description

    UkrFormalityClassification An MTEB dataset Massive Text Embedding Benchmark

    This dataset contains Ukrainian Formality Classification dataset obtained by
    trainslating English GYAFC data.
    English data source: https://aclanthology.org/N18-1012/
    Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M
    Additionally, the dataset was balanced, witha labels: 0 - informal, 1 - formal.
    

    Task category t2c… See the full description on the dataset page: https://huggingface.co/datasets/mteb/UkrFormalityClassification.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ukrainian Texts Classification (2025). ukr-formality-dataset-translated-gyafc [Dataset]. https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc

ukr-formality-dataset-translated-gyafc

ukr-fomalit

ukr-detect/ukr-formality-dataset-translated-gyafc

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Ukrainian Texts Classification
License

https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/

Description

Ukrainian Formality Dataset (translated)

We obtained the first of its kind Ukrainian Formality Classification dataset by trainslating English GYAFC data.

  Dataset formation:

English data source: https://aclanthology.org/N18-1012/ Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M Additionally, the dataset was balanced.

Labels: 0 - informal, 1 - formal.

  Load dataset:

from datasets import load_dataset… See the full description on the dataset page: https://huggingface.co/datasets/ukr-detect/ukr-formality-dataset-translated-gyafc.

Search
Clear search
Close search
Google apps
Main menu