Facebook
TwitterTL;DR Dataset
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.
Data Structure
Format: Standard Type: Prompt-completion
Columns:
"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.
Facebook
TwitterTL;DR Dataset for Preference Learning
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterTL;DR Dataset
Summary
The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.
Data Structure
Format: Standard Type: Prompt-completion
Columns:
"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.