2 datasets found
  1. Webis Clickbait Spoiling Corpus 2022

    • commons.datacite.org
    Updated Mar 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast (2022). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.8136637
    Explore at:
    Dataset updated
    Mar 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    DataCitehttps://www.datacite.org/
    Authors
    Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Webis Clickbait Spoiling Corpus 2022 The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
    This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
    Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
    Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. The test set of this dataset was used for the SemEval-2023 clickbait spoiling task. You can re-execute and adopt the software submissions made through for this SemEval task, please see the instructions and overview of approaches in TIRA. Overview The dataset comes with predefined train/validation/test splits: training.jsonl: 3,200 posts for training validation.jsonl: 800 posts for validation test.jsonl: 1,000 posts for testing The test set was used for the SemEval-2023 clickbait spoiling task. This shared task was organized with TIRA.io and participants submitted Docker software during the task. Please see the instructions in TIRA to re-execute or modify the approaches.

  2. webis-clickbait-22

    • kaggle.com
    zip
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raghav Sakhuja (2024). webis-clickbait-22 [Dataset]. https://www.kaggle.com/datasets/raghavsakhuja/webis-clickbait-22
    Explore at:
    zip(11382736 bytes)Available download formats
    Dataset updated
    Apr 7, 2024
    Authors
    Raghav Sakhuja
    Description

    Dataset

    This dataset was created by Raghav Sakhuja

    Contents

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast (2022). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.8136637
Organization logoOrganization logo

Webis Clickbait Spoiling Corpus 2022

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
DataCitehttps://www.datacite.org/
Authors
Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Webis Clickbait Spoiling Corpus 2022 The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text. The test set of this dataset was used for the SemEval-2023 clickbait spoiling task. You can re-execute and adopt the software submissions made through for this SemEval task, please see the instructions and overview of approaches in TIRA. Overview The dataset comes with predefined train/validation/test splits: training.jsonl: 3,200 posts for training validation.jsonl: 800 posts for validation test.jsonl: 1,000 posts for testing The test set was used for the SemEval-2023 clickbait spoiling task. This shared task was organized with TIRA.io and participants submitted Docker software during the task. Please see the instructions in TIRA to re-execute or modify the approaches.

Search
Clear search
Close search
Google apps
Main menu