9 datasets found
  1. IFEval

    • huggingface.co
    Updated Dec 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). IFEval [Dataset]. https://huggingface.co/datasets/google/IFEval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 22, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for IFEval

      Dataset Summary
    

    This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

    ifeval = load_dataset("google/IFEval")

      Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.
    
  2. IFEval-Ko

    • huggingface.co
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    allganize (2025). IFEval-Ko [Dataset]. https://huggingface.co/datasets/allganize/IFEval-Ko
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Allganize, Inc.
    Authors
    allganize
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    IFEval-Ko: Korean Instruction-Following Benchmark for LLMs

    This dataset is originated from IFEval Dataset

    Korean Version README IFEval-Ko is a Korean adaptation of Google's open-source IFEval benchmark utilized with lm-evaluation-harness framework. It enables evaluation of large language models (LLMs) for their instruction-following capabilities in the Korean language.

      Dataset Details
    

    Original Source: google/IFEvalAdaptation Author: Allganize Inc. LLM TEAM |… See the full description on the dataset page: https://huggingface.co/datasets/allganize/IFEval-Ko.

  3. P

    Multi-IF Dataset

    • paperswithcode.com
    Updated Oct 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yun He; Di Jin; Chaoqi Wang; Chloe Bi; Karishma Mandyam; Hejia Zhang; Chen Zhu; Ning li; Tengyu Xu; Hongjiang Lv; Shruti Bhosale; Chenguang Zhu; Karthik Abinav Sankararaman; Eryk Helenowski; Melanie Kambadur; Aditya Tayade; Hao Ma; Han Fang; Sinong Wang (2024). Multi-IF Dataset [Dataset]. https://paperswithcode.com/dataset/multi-if
    Explore at:
    Dataset updated
    Oct 20, 2024
    Authors
    Yun He; Di Jin; Chaoqi Wang; Chloe Bi; Karishma Mandyam; Hejia Zhang; Chen Zhu; Ning li; Tengyu Xu; Hongjiang Lv; Shruti Bhosale; Chenguang Zhu; Karthik Abinav Sankararaman; Eryk Helenowski; Melanie Kambadur; Aditya Tayade; Hao Ma; Han Fang; Sinong Wang
    Description

    We introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions. Multi-IF, which utilizes a hybrid framework combining LLM and human annotators, expands upon the IFEval by incorporating multi-turn sequences and translating the English prompts into another 7 languages, resulting in a dataset of 4501 multilingual conversations, where each has three turns. Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks. All the models tested showed a higher rate of failure in executing instructions correctly with each additional turn. For example, o1-preview drops from 0.877 at the first turn to 0.707 at the third turn in terms of average accuracy over all languages. Moreover, languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models’ multilingual capabilities.

  4. h

    persian-ifeval

    • huggingface.co
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MCINext (2025). persian-ifeval [Dataset]. https://huggingface.co/datasets/MCINext/persian-ifeval
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    MCINext
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Persian IFEval

      Dataset Summary
    

    Persian IFEval is a localized and culturally adapted version of the IFEval benchmark, designed to evaluate the ability of language models to follow complex instructions in Persian. The dataset focuses on instruction-guided text generation, especially in cases that require adherence to specific constraints such as keyword inclusion, length limits, or structural properties. The dataset was translated from English using a combination of machine… See the full description on the dataset page: https://huggingface.co/datasets/MCINext/persian-ifeval.

  5. h

    PPE-IFEval-Best-of-K

    • huggingface.co
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2024). PPE-IFEval-Best-of-K [Dataset]. https://huggingface.co/datasets/lmarena-ai/PPE-IFEval-Best-of-K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Dataset authored and provided by
    LMArena
    Description

    Overview

    This contains the IFEval correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from IFEval. This dataset is meant for benchmarking and evaluation, not for training. Paper Code

      License
    

    User prompts are licensed under Apache-2.0, and model outputs are governed by the terms of use set by the respective model providers.

      Citation
    

    @misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/PPE-IFEval-Best-of-K.

  6. ifeval_greek

    • huggingface.co
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute for Language and Speech Processing (2025). ifeval_greek [Dataset]. https://huggingface.co/datasets/ilsp/ifeval_greek
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    Institute for Language and Speech Processinghttp://www.ilsp.gr/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for IFEval Greek

    The IFEval Greek dataset contains 541 prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models, manually translated into Greek. The dataset contains "verifiable instructions" such as "απάντησε με περισσότερες από 400 λέξεις" and "ανάφερε τη λέξη ΤΝ τουλάχιστον 3 φορές" which can be verified by heuristics.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: ILSP/Athena RC

    Language(s) (NLP): el… See the full description on the dataset page: https://huggingface.co/datasets/ilsp/ifeval_greek.

  7. h

    Arabic_IFEval

    • huggingface.co
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inception (2025). Arabic_IFEval [Dataset]. https://huggingface.co/datasets/inceptionai/Arabic_IFEval
    Explore at:
    Dataset updated
    Apr 5, 2025
    Dataset authored and provided by
    Inception
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    IFEval is the first publicly available benchmark dataset specifically designed to evaluate Arabic Large Language Models (LLMs) on instruction-following capabilities in Arabic. The dataset includes 404 high-quality, manually verified samples covering various constraints such as linguistic patterns, punctuation rules, and formatting guidelines.

      Loading the Dataset
    

    To load this dataset in Python using the 🤗 Datasets library, run the following: from datasets import load_dataset

    ifeval… See the full description on the dataset page: https://huggingface.co/datasets/inceptionai/Arabic_IFEval.

  8. h

    ifeval-ita

    • huggingface.co
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mii-llm (2025). ifeval-ita [Dataset]. https://huggingface.co/datasets/mii-llm/ifeval-ita
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Dataset authored and provided by
    mii-llm
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for IFEval

      Dataset Summary
    

    This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

    ifeval = load_dataset("mii-llm/ifeval-ita")

      Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/mii-llm/ifeval-ita.
    
  9. h

    IFEval_ca

    • huggingface.co
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Projecte Aina (2024). IFEval_ca [Dataset]. https://huggingface.co/datasets/projecte-aina/IFEval_ca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2024
    Dataset authored and provided by
    Projecte Aina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for IFEval_ca

    IFEval_ca is a prompt dataset in Catalan, professionally translated from the main version of the IFEval dataset in English.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    IFEval_ca (Instruction-Following Eval benchmark - Catalan) is designed to evaluating chat or instruction fine-tuned language models. The dataset comprises 541 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times"… See the full description on the dataset page: https://huggingface.co/datasets/projecte-aina/IFEval_ca.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google (2023). IFEval [Dataset]. https://huggingface.co/datasets/google/IFEval
Organization logo

IFEval

IFEval

google/IFEval

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 22, 2023
Dataset authored and provided by
Googlehttp://google.com/
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for IFEval

  Dataset Summary

This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

ifeval = load_dataset("google/IFEval")

  Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.
Search
Clear search
Close search
Google apps
Main menu