3 datasets found

Webis Clickbait Spoiling Corpus 2022
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.6362726
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6362726
Dataset updated
Jul 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# Webis Clickbait Spoiling Corpus 2022

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text.

We want to organize a shared task on clickbait spoiling. Hence, we omit the 1,000 test post from this version of the dataset and will publish the test posts later.

# Overview

The dataset comes with predefined train/validation/test splits:

training.jsonl contains 3,200 posts for training

validation.jsonl contains 800 posts for validation

test.jsonl contains 1,000 posts for testing

The test set is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.

clickbait-spoiling-21.jsonl contains the complete corpus with 5,000 clickbait posts

The clickbait-spoiling-21.jsonl file is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
W
Webis-Clickbait-22
webis.de
anthology.aicmu.ac.cn
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast (2022). Webis-Clickbait-22 [Dataset]. https://webis.de/data/webis-clickbait-22.html
Explore at:
Dataset updated
2022
Dataset provided by
Friedrich Schiller University Jena
The Web Technology & Information Systems Network
University of Kassel, hessian.AI, and ScaDS.AI
Authors
Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter. This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.
h
clickbait-spoiling-data-question
huggingface.co
Updated Sep 18, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pramit Sahoo (2013). clickbait-spoiling-data-question [Dataset]. https://huggingface.co/datasets/pramitsahoo/clickbait-spoiling-data-question
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2013
Authors
Pramit Sahoo
Description
Webis Clickbait Spoiling Corpus

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter. This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post. Additionally… See the full description on the dataset page: https://huggingface.co/datasets/pramitsahoo/clickbait-spoiling-data-question.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.6362726

Webis Clickbait Spoiling Corpus 2022

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6362726

Dataset updated

Jul 11, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Webis Clickbait Spoiling Corpus 2022

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text.

We want to organize a shared task on clickbait spoiling. Hence, we omit the 1,000 test post from this version of the dataset and will publish the test posts later.

# Overview

The dataset comes with predefined train/validation/test splits:

training.jsonl contains 3,200 posts for training
validation.jsonl contains 800 posts for validation
test.jsonl contains 1,000 posts for testing
- The test set is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
clickbait-spoiling-21.jsonl contains the complete corpus with 5,000 clickbait posts
- The clickbait-spoiling-21.jsonl file is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.

Clear search

Close search

Google apps

Main menu

Webis Clickbait Spoiling Corpus 2022

Webis-Clickbait-22

clickbait-spoiling-data-question

Webis Clickbait Spoiling Corpus 2022