9 datasets found

IFEval
huggingface.co
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). IFEval [Dataset]. https://huggingface.co/datasets/google/IFEval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 22, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for IFEval

Dataset Summary

This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

ifeval = load_dataset("google/IFEval")

Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.
IFEval-Ko
huggingface.co
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
allganize (2025). IFEval-Ko [Dataset]. https://huggingface.co/datasets/allganize/IFEval-Ko
Explore at:
Dataset updated
Apr 19, 2025
Dataset provided by
Allganize, Inc.
Authors
allganize
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
IFEval-Ko: Korean Instruction-Following Benchmark for LLMs

This dataset is originated from IFEval Dataset

Korean Version README IFEval-Ko is a Korean adaptation of Google's open-source IFEval benchmark utilized with lm-evaluation-harness framework. It enables evaluation of large language models (LLMs) for their instruction-following capabilities in the Korean language.

Dataset Details

Original Source: google/IFEvalAdaptation Author: Allganize Inc. LLM TEAM |… See the full description on the dataset page: https://huggingface.co/datasets/allganize/IFEval-Ko.
P
Multi-IF Dataset
paperswithcode.com
Updated Oct 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yun He; Di Jin; Chaoqi Wang; Chloe Bi; Karishma Mandyam; Hejia Zhang; Chen Zhu; Ning li; Tengyu Xu; Hongjiang Lv; Shruti Bhosale; Chenguang Zhu; Karthik Abinav Sankararaman; Eryk Helenowski; Melanie Kambadur; Aditya Tayade; Hao Ma; Han Fang; Sinong Wang (2024). Multi-IF Dataset [Dataset]. https://paperswithcode.com/dataset/multi-if
Explore at:
Dataset updated
Oct 20, 2024
Authors
Yun He; Di Jin; Chaoqi Wang; Chloe Bi; Karishma Mandyam; Hejia Zhang; Chen Zhu; Ning li; Tengyu Xu; Hongjiang Lv; Shruti Bhosale; Chenguang Zhu; Karthik Abinav Sankararaman; Eryk Helenowski; Melanie Kambadur; Aditya Tayade; Hao Ma; Han Fang; Sinong Wang
Description
We introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions. Multi-IF, which utilizes a hybrid framework combining LLM and human annotators, expands upon the IFEval by incorporating multi-turn sequences and translating the English prompts into another 7 languages, resulting in a dataset of 4501 multilingual conversations, where each has three turns. Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks. All the models tested showed a higher rate of failure in executing instructions correctly with each additional turn. For example, o1-preview drops from 0.877 at the first turn to 0.707 at the third turn in terms of average accuracy over all languages. Moreover, languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models’ multilingual capabilities.
h
persian-ifeval
huggingface.co
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MCINext (2025). persian-ifeval [Dataset]. https://huggingface.co/datasets/MCINext/persian-ifeval
Explore at:
Dataset updated
May 31, 2025
Dataset authored and provided by
MCINext
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Persian IFEval

Dataset Summary

Persian IFEval is a localized and culturally adapted version of the IFEval benchmark, designed to evaluate the ability of language models to follow complex instructions in Persian. The dataset focuses on instruction-guided text generation, especially in cases that require adherence to specific constraints such as keyword inclusion, length limits, or structural properties. The dataset was translated from English using a combination of machine… See the full description on the dataset page: https://huggingface.co/datasets/MCINext/persian-ifeval.
h
PPE-IFEval-Best-of-K
huggingface.co
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMArena (2024). PPE-IFEval-Best-of-K [Dataset]. https://huggingface.co/datasets/lmarena-ai/PPE-IFEval-Best-of-K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2024
Dataset authored and provided by
LMArena
Description
Overview

This contains the IFEval correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from IFEval. This dataset is meant for benchmarking and evaluation, not for training. Paper Code

License

User prompts are licensed under Apache-2.0, and model outputs are governed by the terms of use set by the respective model providers.

Citation

@misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/PPE-IFEval-Best-of-K.
ifeval_greek
huggingface.co
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Language and Speech Processing (2025). ifeval_greek [Dataset]. https://huggingface.co/datasets/ilsp/ifeval_greek
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 2, 2025
Dataset authored and provided by
Institute for Language and Speech Processinghttp://www.ilsp.gr/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for IFEval Greek

The IFEval Greek dataset contains 541 prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models, manually translated into Greek. The dataset contains "verifiable instructions" such as "απάντησε με περισσότερες από 400 λέξεις" and "ανάφερε τη λέξη ΤΝ τουλάχιστον 3 φορές" which can be verified by heuristics.

Dataset Details Dataset Description

Curated by: ILSP/Athena RC

Language(s) (NLP): el… See the full description on the dataset page: https://huggingface.co/datasets/ilsp/ifeval_greek.
h
Arabic_IFEval
huggingface.co
Updated Apr 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inception (2025). Arabic_IFEval [Dataset]. https://huggingface.co/datasets/inceptionai/Arabic_IFEval
Explore at:
Dataset updated
Apr 5, 2025
Dataset authored and provided by
Inception
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
IFEval is the first publicly available benchmark dataset specifically designed to evaluate Arabic Large Language Models (LLMs) on instruction-following capabilities in Arabic. The dataset includes 404 high-quality, manually verified samples covering various constraints such as linguistic patterns, punctuation rules, and formatting guidelines.

Loading the Dataset

To load this dataset in Python using the 🤗 Datasets library, run the following: from datasets import load_dataset

ifeval… See the full description on the dataset page: https://huggingface.co/datasets/inceptionai/Arabic_IFEval.
h
ifeval-ita
huggingface.co
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mii-llm (2025). ifeval-ita [Dataset]. https://huggingface.co/datasets/mii-llm/ifeval-ita
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2025
Dataset authored and provided by
mii-llm
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for IFEval

Dataset Summary

This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

ifeval = load_dataset("mii-llm/ifeval-ita")

Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/mii-llm/ifeval-ita.
h
IFEval_ca
huggingface.co
Updated Dec 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Projecte Aina (2024). IFEval_ca [Dataset]. https://huggingface.co/datasets/projecte-aina/IFEval_ca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2024
Dataset authored and provided by
Projecte Aina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for IFEval_ca

IFEval_ca is a prompt dataset in Catalan, professionally translated from the main version of the IFEval dataset in English.

Dataset Details Dataset Description

IFEval_ca (Instruction-Following Eval benchmark - Catalan) is designed to evaluating chat or instruction fine-tuned language models. The dataset comprises 541 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times"… See the full description on the dataset page: https://huggingface.co/datasets/projecte-aina/IFEval_ca.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Google (2023). IFEval [Dataset]. https://huggingface.co/datasets/google/IFEval

IFEval

google/IFEval

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 22, 2023

Dataset authored and provided by

Googlehttp://google.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for IFEval

  Dataset Summary

This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset

ifeval = load_dataset("google/IFEval")

  Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.

Clear search

Close search

Google apps

Main menu

IFEval

IFEval-Ko

Multi-IF Dataset

persian-ifeval

PPE-IFEval-Best-of-K

ifeval_greek

Arabic_IFEval

ifeval-ita

IFEval_ca

IFEval

IFEval

google/IFEval