2 datasets found
  1. h

    tldr

    • huggingface.co
    Updated Aug 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Dataset authored and provided by
    TRL
    Description

    TL;DR Dataset

      Summary
    

    The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

      Data Structure
    

    Format: Standard Type: Prompt-completion

    Columns:

    "pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.

  2. h

    tldr-preference

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    TRL
    Description

    TL;DR Dataset for Preference Learning

      Summary
    

    The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr

tldr

trl-lib/tldr

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Dataset authored and provided by
TRL
Description

TL;DR Dataset

  Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

  Data Structure

Format: Standard Type: Prompt-completion

Columns:

"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.

Search
Clear search
Close search
Google apps
Main menu