8 datasets found

h
tldr-preference
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TRL
Description
TL;DR Dataset for Preference Learning

Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.
h
tldr
huggingface.co
Updated Aug 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Dataset authored and provided by
TRL
Description
TL;DR Dataset

Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

Data Structure

Format: Standard Type: Prompt-completion

Columns:

"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.
llava-instruct-mix-vsft
huggingface.co
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face H4 (2024). llava-instruct-mix-vsft [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/llava-instruct-mix-vsft
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
Description
theblackcat102/llava-instruct-mix reformated for VSFT with TRL's SFT Trainer. See https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py.
h
grpo-completions-new
huggingface.co
Updated Mar 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien (2025). grpo-completions-new [Dataset]. https://huggingface.co/datasets/davanstrien/grpo-completions-new
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2025
Authors
Daniel van Strien
Description
TRL GRPO Completion logs

This dataset contains the completions generated during training using trl and GRPO. The completions are stored in parquet files, and each file contains the completions for a single step of training (depending on the logging_steps argument). Each file contains the following columns:

step: the step of training prompt: the prompt used to generate the completion completion: the completion generated by the model reward: the reward given to the completion by all… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/grpo-completions-new.
P
RewardBench Dataset
paperswithcode.com
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi (2025). RewardBench Dataset [Dataset]. https://paperswithcode.com/dataset/rewardbench
Explore at:
Dataset updated
Jan 20, 2025
Authors
Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi
Description
RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹.

Here are the key components of RewardBench:

Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹.

Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior².

Scripts for Evaluation:

scripts/run_rm.py: Used to evaluate individual reward models. scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models. scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹.

Installation and Usage:

Install PyTorch on your system. Install the required dependencies using pip install -e .. Set the environment variable HF_TOKEN with your token. To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹.

Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍

(1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.
h
final_data
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data
Explore at:
Dataset updated
Jul 3, 2025
Authors
Najmul Islam Naeem
Description
Final Data - LLaMA Fine-Tuning Dataset

This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

Structure

train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

Format

Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

dataset = load_dataset("csenaeem/final_data")

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.
h
Capybara
huggingface.co
Updated Dec 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luigi D (2023). Capybara [Dataset]. https://huggingface.co/datasets/LDJnr/Capybara
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2023
Authors
Luigi D
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is the Official Capybara dataset. Over 10,000 multi-turn examples.

Capybara is the culmination of insights derived from synthesis techniques like Evol-instruct (used for WizardLM), Alpaca, Orca, Vicuna, Lamini, FLASK and others. The single-turn seeds used to initiate the Amplify-Instruct synthesis of conversations are mostly based on datasets that i've personally vetted extensively, and are often highly regarded for their diversity and demonstration of logical robustness and… See the full description on the dataset page: https://huggingface.co/datasets/LDJnr/Capybara.
h
xlsum_pref_5k
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenguang Wang (2025). xlsum_pref_5k [Dataset]. https://huggingface.co/datasets/chenguang-wang/xlsum_pref_5k
Explore at:
Dataset updated
May 11, 2025
Authors
Chenguang Wang
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Details

This is a Preference dataset compatible with TRL DPO training. The original data comes from csebuetnlp/xlsum. Based on the addition of System Prompt and User Prompt Prefix, chenguang-wang/Qwen2.5-3B-Instruct-summary-sft-adapter is used to generate completion, and mistralai/Mixtral-8x22B-v0.1 is used together with the summary field in the source dataset to annotate the preference. This dataset is not guaranteed to be of high quality and is only used for testing… See the full description on the dataset page: https://huggingface.co/datasets/chenguang-wang/xlsum_pref_5k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference

tldr-preference

trl-lib/tldr-preference

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset authored and provided by

TRL

Description

TL;DR Dataset for Preference Learning

  Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.

Clear search

Close search

Google apps

Main menu

tldr-preference

tldr

llava-instruct-mix-vsft

grpo-completions-new

RewardBench Dataset

final_data

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

Capybara

xlsum_pref_5k

tldr-preferenceSee More Versions

trl-lib/tldr-preference

tldr-preference