TL;DR Dataset for Preference Learning
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.
TL;DR Dataset
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.
Data Structure
Format: Standard Type: Prompt-completion
Columns:
"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.
theblackcat102/llava-instruct-mix reformated for VSFT with TRL's SFT Trainer. See https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py.
TRL GRPO Completion logs
This dataset contains the completions generated during training using trl and GRPO. The completions are stored in parquet files, and each file contains the completions for a single step of training (depending on the logging_steps argument). Each file contains the following columns:
step: the step of training prompt: the prompt used to generate the completion completion: the completion generated by the model reward: the reward given to the completion by all… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/grpo-completions-new.
RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹.
Here are the key components of RewardBench:
Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹.
Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior².
Scripts for Evaluation:
scripts/run_rm.py: Used to evaluate individual reward models. scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models. scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹.
Installation and Usage:
Install PyTorch on your system. Install the required dependencies using pip install -e .. Set the environment variable HF_TOKEN with your token. To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹.
Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍
(1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.
Final Data - LLaMA Fine-Tuning Dataset
This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.
Structure
train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples
Format
Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset
dataset = load_dataset("csenaeem/final_data")
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is the Official Capybara dataset. Over 10,000 multi-turn examples.
Capybara is the culmination of insights derived from synthesis techniques like Evol-instruct (used for WizardLM), Alpaca, Orca, Vicuna, Lamini, FLASK and others. The single-turn seeds used to initiate the Amplify-Instruct synthesis of conversations are mostly based on datasets that i've personally vetted extensively, and are often highly regarded for their diversity and demonstration of logical robustness and… See the full description on the dataset page: https://huggingface.co/datasets/LDJnr/Capybara.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Details
This is a Preference dataset compatible with TRL DPO training. The original data comes from csebuetnlp/xlsum. Based on the addition of System Prompt and User Prompt Prefix, chenguang-wang/Qwen2.5-3B-Instruct-summary-sft-adapter is used to generate completion, and mistralai/Mixtral-8x22B-v0.1 is used together with the summary field in the source dataset to annotate the preference. This dataset is not guaranteed to be of high quality and is only used for testing… See the full description on the dataset page: https://huggingface.co/datasets/chenguang-wang/xlsum_pref_5k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
TL;DR Dataset for Preference Learning
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.