8 datasets found
  1. h

    tldr-preference

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    TRL
    Description

    TL;DR Dataset for Preference Learning

      Summary
    

    The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.

  2. h

    tldr

    • huggingface.co
    Updated Aug 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Dataset authored and provided by
    TRL
    Description

    TL;DR Dataset

      Summary
    

    The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

      Data Structure
    

    Format: Standard Type: Prompt-completion

    Columns:

    "pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.

  3. llava-instruct-mix-vsft

    • huggingface.co
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2024). llava-instruct-mix-vsft [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/llava-instruct-mix-vsft
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    theblackcat102/llava-instruct-mix reformated for VSFT with TRL's SFT Trainer. See https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py.

  4. h

    grpo-completions-new

    • huggingface.co
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel van Strien (2025). grpo-completions-new [Dataset]. https://huggingface.co/datasets/davanstrien/grpo-completions-new
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2025
    Authors
    Daniel van Strien
    Description

    TRL GRPO Completion logs

    This dataset contains the completions generated during training using trl and GRPO. The completions are stored in parquet files, and each file contains the completions for a single step of training (depending on the logging_steps argument). Each file contains the following columns:

    step: the step of training prompt: the prompt used to generate the completion completion: the completion generated by the model reward: the reward given to the completion by all… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/grpo-completions-new.

  5. P

    RewardBench Dataset

    • paperswithcode.com
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi (2025). RewardBench Dataset [Dataset]. https://paperswithcode.com/dataset/rewardbench
    Explore at:
    Dataset updated
    Jan 20, 2025
    Authors
    Nathan Lambert; Valentina Pyatkin; Jacob Morrison; LJ Miranda; Bill Yuchen Lin; Khyathi Chandu; Nouha Dziri; Sachin Kumar; Tom Zick; Yejin Choi; Noah A. Smith; Hannaneh Hajishirzi
    Description

    RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹.

    Here are the key components of RewardBench:

    Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹.

    Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior².

    Scripts for Evaluation:

    scripts/run_rm.py: Used to evaluate individual reward models. scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models. scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹.

    Installation and Usage:

    Install PyTorch on your system. Install the required dependencies using pip install -e .. Set the environment variable HF_TOKEN with your token. To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹.

    Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍

    (1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.

  6. h

    final_data

    • huggingface.co
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data
    Explore at:
    Dataset updated
    Jul 3, 2025
    Authors
    Najmul Islam Naeem
    Description

    Final Data - LLaMA Fine-Tuning Dataset

    This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

      Structure
    

    train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

      Format
    

    Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

    dataset = load_dataset("csenaeem/final_data")

    … See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

  7. h

    Capybara

    • huggingface.co
    Updated Dec 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luigi D (2023). Capybara [Dataset]. https://huggingface.co/datasets/LDJnr/Capybara
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2023
    Authors
    Luigi D
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is the Official Capybara dataset. Over 10,000 multi-turn examples.

    Capybara is the culmination of insights derived from synthesis techniques like Evol-instruct (used for WizardLM), Alpaca, Orca, Vicuna, Lamini, FLASK and others. The single-turn seeds used to initiate the Amplify-Instruct synthesis of conversations are mostly based on datasets that i've personally vetted extensively, and are often highly regarded for their diversity and demonstration of logical robustness and… See the full description on the dataset page: https://huggingface.co/datasets/LDJnr/Capybara.

  8. h

    xlsum_pref_5k

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenguang Wang (2025). xlsum_pref_5k [Dataset]. https://huggingface.co/datasets/chenguang-wang/xlsum_pref_5k
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Chenguang Wang
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Details

    This is a Preference dataset compatible with TRL DPO training. The original data comes from csebuetnlp/xlsum. Based on the addition of System Prompt and User Prompt Prefix, chenguang-wang/Qwen2.5-3B-Instruct-summary-sft-adapter is used to generate completion, and mistralai/Mixtral-8x22B-v0.1 is used together with the summary field in the source dataset to annotate the preference. This dataset is not guaranteed to be of high quality and is only used for testing… See the full description on the dataset page: https://huggingface.co/datasets/chenguang-wang/xlsum_pref_5k.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference

tldr-preference

trl-lib/tldr-preference

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TRL
Description

TL;DR Dataset for Preference Learning

  Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.

Search
Clear search
Close search
Google apps
Main menu