67 datasets found

reward-bench
huggingface.co
Updated Mar 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). reward-bench [Dataset]. http://doi.org/10.57967/hf/2457
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2457
Dataset updated
Mar 25, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Code | Leaderboard | Prior Preference Sets | Results | Paper

Reward Bench Evaluation Dataset Card

The RewardBench evaluation dataset evaluates capabilities of reward models over the following categories:

Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium) Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut… See the full description on the dataset page: https://huggingface.co/datasets/allenai/reward-bench.
P
RewardBench Dataset
paperswithcode.com
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi (2024). RewardBench Dataset [Dataset]. https://paperswithcode.com/dataset/rewardbench
Explore at:
Dataset updated
Apr 12, 2024
Authors
Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi
Description
RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹.

Here are the key components of RewardBench:

Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹.

Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior².

Scripts for Evaluation:

scripts/run_rm.py: Used to evaluate individual reward models. scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models. scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹.

Installation and Usage:

Install PyTorch on your system. Install the required dependencies using pip install -e .. Set the environment variable HF_TOKEN with your token. To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹.

Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍

(1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.
reward-bench-2
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). reward-bench-2 [Dataset]. https://huggingface.co/datasets/allenai/reward-bench-2
Explore at:
Dataset updated
Jun 3, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Code | Leaderboard | Results | Paper

RewardBench 2 Evaluation Dataset Card

The RewardBench 2 evaluation dataset is the new version of RewardBench that is based on unseen human data and designed to be substantially more difficult! RewardBench 2 evaluates capabilities of reward models over the following categories:

Factuality (NEW!): Tests the ability of RMs to detect hallucinations and other basic errors in completions. Precise Instruction Following (NEW!): Tests the ability of RMs… See the full description on the dataset page: https://huggingface.co/datasets/allenai/reward-bench-2.
reward-bench-results
huggingface.co
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). reward-bench-results [Dataset]. https://huggingface.co/datasets/allenai/reward-bench-results
Explore at:
Dataset updated
Apr 30, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
Description
Results for Holisitic Evaluation of Reward Models (HERM) Benchmark

Here, you'll find the raw scores for the HERM project. The repository is structured as follows. ├── best-of-n/ <- Nested directory for different completions on Best of N challenge | ├── alpaca_eval/ └── results for each reward model | | ├── tulu-13b/{org}/{model}.json
| | └── zephyr-7b/{org}/{model}.json | └── mt_bench/
|… See the full description on the dataset page: https://huggingface.co/datasets/allenai/reward-bench-results.
h
multilingual-reward-bench
huggingface.co
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cohere Labs Community (2025). multilingual-reward-bench [Dataset]. http://doi.org/10.57967/hf/3352
Explore at:
Unique identifier
https://doi.org/10.57967/hf/3352
Dataset updated
May 15, 2025
Dataset authored and provided by
Cohere Labs Community
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Multilingual Reward Bench (v1.0)

Reward models (RMs) have driven the development of state-of-the-art LLMs today, with unprecedented impact across the globe. However, their performance in multilingual settings still remains understudied. In order to probe reward model behavior on multilingual data, we present M-RewardBench, a benchmark for 23 typologically diverse languages. M-RewardBench contains prompt-chosen-rejected preference triples obtained by curating and translating chat… See the full description on the dataset page: https://huggingface.co/datasets/CohereLabsCommunity/multilingual-reward-bench.
reward-bench-2-results
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). reward-bench-2-results [Dataset]. https://huggingface.co/datasets/allenai/reward-bench-2-results
Explore at:
Dataset updated
Jun 3, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
Description
allenai/reward-bench-2-results dataset hosted on Hugging Face and contributed by the HF Datasets community
h
agent-reward-bench
huggingface.co
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McGill NLP Group (2025). agent-reward-bench [Dataset]. https://huggingface.co/datasets/McGill-NLP/agent-reward-bench
Explore at:
Dataset updated
Apr 15, 2025
Dataset authored and provided by
McGill NLP Group
Description
AgentRewardBench

💾Code 📄Paper 🌐Website

🤗Dataset 💻Demo 🏆Leaderboard

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent TrajectoriesXing Han Lù, Amirhossein Kazemnejad*, Nicholas Meade, Arkil Patel, Dongchan Shin, Alejandra Zambrano, Karolina Stańczak, Peter Shaw, Christopher J. Pal, Siva Reddy*Core Contributor

Loading dataset

You can use the huggingface_hub library to load the dataset. The dataset is available on Huggingface Hub at… See the full description on the dataset page: https://huggingface.co/datasets/McGill-NLP/agent-reward-bench.
h
MM-RLHF
huggingface.co
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Fan Zhang (2025). MM-RLHF [Dataset]. https://huggingface.co/datasets/yifanzhang114/MM-RLHF
Explore at:
Dataset updated
Feb 17, 2025
Authors
Yi-Fan Zhang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
[📖 arXiv Paper] [📊 Training Code] [📝 Homepage] [🏆 Reward Model] [🔮 MM-RewardBench] [🔮 MM-SafetyBench] [📈 Evaluation Suite]

The Next Step Forward in Multimodal LLM Alignment

[2025/02/10] 🔥 We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:

A high-quality MLLM alignment dataset. A strong Critique-Based MLLM reward model and its training algorithm. A novel… See the full description on the dataset page: https://huggingface.co/datasets/yifanzhang114/MM-RLHF.
P
VL-RewardBench Dataset
paperswithcode.com
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei LI; Yuancheng Wei; Zhihui Xie; Xuqing Yang; YiFan Song; Peiyi Wang; Chenxin An; Tianyu Liu; Sujian Li; Bill Yuchen Lin; Lingpeng Kong; Qi Liu (2025). VL-RewardBench Dataset [Dataset]. https://paperswithcode.com/dataset/vl-rewardbench
Explore at:
Dataset updated
May 23, 2025
Authors
Lei LI; Yuancheng Wei; Zhihui Xie; Xuqing Yang; YiFan Song; Peiyi Wang; Chenxin An; Tianyu Liu; Sujian Li; Bill Yuchen Lin; Lingpeng Kong; Qi Liu
Description
Vision-language generative reward models (VL-GenRMs) play a crucial role in aligning and evaluating multimodal AI systems, yet their own evaluation remains under-explored. Current assessment methods primarily rely on AI-annotated preference labels from traditional VL tasks, which can introduce biases and often fail to effectively challenge state-of-the-art models. To address these limitations, we introduce VL-RewardBench, a comprehensive benchmark spanning general multimodal queries, visual hallucination detection, and complex reasoning tasks. Through our AI-assisted annotation pipeline combining sample selection with human verification, we curate 1,250 high-quality examples specifically designed to probe model limitations. Comprehensive evaluation across 16 leading large vision-language models, demonstrates VL-RewardBench's effectiveness as a challenging testbed, where even GPT-4o achieves only 65.4% accuracy, and state-of-the-art open-source models such as Qwen2-VL-72B, struggle to surpass random-guessing. Importantly, performance on VL-RewardBench strongly correlates (Pearson's r > 0.9) with MMMU-Pro accuracy using Best-of-N sampling with VL-GenRMs. Analysis experiments uncover three critical insights for improving VL-GenRMs: (i) models predominantly fail at basic visual perception tasks rather than reasoning tasks; (ii) inference-time scaling benefits vary dramatically by model capacity; and (iii) training VL-GenRMs to learn to judge substantially boosts judgment capability (+14.7% accuracy for a 7B VL-GenRM). We believe VL-RewardBench along with the experimental insights will become a valuable resource for advancing VL-GenRMs.
h
R3-eval-reward-bench
huggingface.co
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rubricreward (2025). R3-eval-reward-bench [Dataset]. https://huggingface.co/datasets/rubricreward/R3-eval-reward-bench
Explore at:
Dataset updated
May 21, 2025
Dataset authored and provided by
rubricreward
Description
rubricreward/R3-eval-reward-bench dataset hosted on Hugging Face and contributed by the HF Datasets community
fc-reward-bench
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research, fc-reward-bench [Dataset]. https://huggingface.co/datasets/ibm-research/fc-reward-bench
Explore at:
Dataset provided by
IBM Research
IBMhttp://ibm.com/
Authors
IBM Research
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
fc-reward-bench

fc-reward-bench is a benchmark designed to evaluate reward model performance in function-calling tasks. It features 1,500 unique user inputs derived from the single-turn splits of the BFCL-v3 dataset. Each input is paired with both correct and incorrect function calls. Correct calls are sourced directly from BFCL, while incorrect calls are generated by 25 permissively licensed models.

Dataset Structure

Each entry in the dataset includes the following… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/fc-reward-bench.
h
reward-bench-critique-alpacaeval-easy
huggingface.co
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
distilabel-internal-testing (2024). reward-bench-critique-alpacaeval-easy [Dataset]. https://huggingface.co/datasets/distilabel-internal-testing/reward-bench-critique-alpacaeval-easy
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2024
Dataset authored and provided by
distilabel-internal-testing
Description
Description

This dataset is a small subset of allenai/reward-bench to test with our critique models. It was generated in the following way: from datasets import Dataset import pandas as pd

from datasets import load_dataset

ds = load_dataset("allenai/reward-bench", split="filtered")

data = [] for row in ds.filter(lambda x: x["subset"] == "alpacaeval-easy"): for response in ["chosen", "rejected"]: model, is_chosen = (row["chosen_model"], True) if response == "chosen"… See the full description on the dataset page: https://huggingface.co/datasets/distilabel-internal-testing/reward-bench-critique-alpacaeval-easy.
h
reward-bench-2-converted
huggingface.co
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
john02171574 (2025). reward-bench-2-converted [Dataset]. https://huggingface.co/datasets/john02171574/reward-bench-2-converted
Explore at:
Dataset updated
Jun 19, 2025
Authors
john02171574
Description
john02171574/reward-bench-2-converted dataset hosted on Hugging Face and contributed by the HF Datasets community
h
reward-bench-chat-rewritten
huggingface.co
Updated Dec 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haoxiang Wang (2024). reward-bench-chat-rewritten [Dataset]. https://huggingface.co/datasets/Haoxiang-Wang/reward-bench-chat-rewritten
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2024
Authors
Haoxiang Wang
Description
Haoxiang-Wang/reward-bench-chat-rewritten dataset hosted on Hugging Face and contributed by the HF Datasets community
h
allenai-reward-bench
huggingface.co
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nour Guermazi (2025). allenai-reward-bench [Dataset]. https://huggingface.co/datasets/nourguermazi/allenai-reward-bench
Explore at:
Dataset updated
Jun 12, 2025
Authors
Nour Guermazi
Description
nourguermazi/allenai-reward-bench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
VL-RewardBench
huggingface.co
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Multi-modal Multilingual Instruction (2024). VL-RewardBench [Dataset]. https://huggingface.co/datasets/MMInstruction/VL-RewardBench
Explore at:
Dataset updated
Nov 29, 2024
Dataset authored and provided by
Multi-modal Multilingual Instruction
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for VLRewardBench

Project Page: https://vl-rewardbench.github.io

Dataset Summary

VLRewardBench is a comprehensive benchmark designed to evaluate vision-language generative reward models (VL-GenRMs) across visual perception, hallucination detection, and reasoning tasks. The benchmark contains 1,250 high-quality examples specifically curated to probe model limitations.

Dataset Structure

Each instance consists of multimodal queries spanning three key… See the full description on the dataset page: https://huggingface.co/datasets/MMInstruction/VL-RewardBench.
h
reward-bench-reasoning
huggingface.co
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sylvia Chen (2024). reward-bench-reasoning [Dataset]. https://huggingface.co/datasets/hsicat/reward-bench-reasoning
Explore at:
Dataset updated
Jul 31, 2024
Authors
Sylvia Chen
Description
This evaluation dataset is the reasoning subset from allenai/reward-bench.
h
reward-bench
huggingface.co
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingxuan Sun (2025). reward-bench [Dataset]. https://huggingface.co/datasets/PirxTion/reward-bench
Explore at:
Dataset updated
Jun 12, 2025
Authors
Jingxuan Sun
Description
PirxTion/reward-bench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
reward-bench-hacking-rewards-harmless-train-normal
huggingface.co
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Singh (2025). reward-bench-hacking-rewards-harmless-train-normal [Dataset]. https://huggingface.co/datasets/Ayush-Singh/reward-bench-hacking-rewards-harmless-train-normal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Authors
Ayush Singh
Description
Ayush-Singh/reward-bench-hacking-rewards-harmless-train-normal dataset hosted on Hugging Face and contributed by the HF Datasets community
h
reward-bench-pythia-1.4b-set3-scores
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Singh, reward-bench-pythia-1.4b-set3-scores [Dataset]. https://huggingface.co/datasets/Ayush-Singh/reward-bench-pythia-1.4b-set3-scores
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Ayush Singh
Description
Ayush-Singh/reward-bench-pythia-1.4b-set3-scores dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Ai2 (2024). reward-bench [Dataset]. http://doi.org/10.57967/hf/2457

reward-bench

RM Bench

allenai/reward-bench

Explore at:

268 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.57967/hf/2457

Dataset updated

Mar 25, 2024

Dataset provided by

Allen Institute for AIhttp://allenai.org/

Authors

Ai2

License

https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

Description

Code | Leaderboard | Prior Preference Sets | Results | Paper

  Reward Bench Evaluation Dataset Card

The RewardBench evaluation dataset evaluates capabilities of reward models over the following categories:

Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium) Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut… See the full description on the dataset page: https://huggingface.co/datasets/allenai/reward-bench.

Clear search

Close search

Google apps

Main menu

reward-bench

RewardBench Dataset

reward-bench-2

reward-bench-results

multilingual-reward-bench

reward-bench-2-results

agent-reward-bench

MM-RLHF

VL-RewardBench Dataset

R3-eval-reward-bench

fc-reward-bench

reward-bench-critique-alpacaeval-easy

reward-bench-2-converted

reward-bench-chat-rewritten

allenai-reward-bench

VL-RewardBench

reward-bench-reasoning

reward-bench

reward-bench-hacking-rewards-harmless-train-normal

reward-bench-pythia-1.4b-set3-scores

reward-bench

RM Bench

allenai/reward-bench