2 datasets found

h
tldr
huggingface.co
Updated Aug 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Dataset authored and provided by
TRL
Description
TL;DR Dataset

Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

Data Structure

Format: Standard Type: Prompt-completion

Columns:

"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.
h
tldr-preference
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL, tldr-preference [Dataset]. https://huggingface.co/datasets/trl-lib/tldr-preference
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TRL
Description
TL;DR Dataset for Preference Learning

Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for preference learning and Reinforcement Learning from Human Feedback (RLHF) tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training models to understand and generate concise… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr-preference.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

TRL (2024). tldr [Dataset]. https://huggingface.co/datasets/trl-lib/tldr

tldr

trl-lib/tldr

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 22, 2024

Dataset authored and provided by

TRL

Description

TL;DR Dataset

  Summary

The TL;DR dataset is a processed version of Reddit posts, specifically curated to train models using the TRL library for summarization tasks. It leverages the common practice on Reddit where users append "TL;DR" (Too Long; Didn't Read) summaries to lengthy posts, providing a rich source of paired text data for training summarization models.

  Data Structure

Format: Standard Type: Prompt-completion

Columns:

"pompt": The unabridged Reddit… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/tldr.

Clear search

Close search

Google apps

Main menu

tldr

tldr-preference

tldrSee More Versions

trl-lib/tldr

tldr