3 datasets found
  1. Webis Clickbait Spoiling Corpus 2022

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.6362726
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # Webis Clickbait Spoiling Corpus 2022

    The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
    This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

    This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
    Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text.

    We want to organize a shared task on clickbait spoiling. Hence, we omit the 1,000 test post from this version of the dataset and will publish the test posts later.

    # Overview

    The dataset comes with predefined train/validation/test splits:

    • training.jsonl contains 3,200 posts for training
    • validation.jsonl contains 800 posts for validation
    • test.jsonl contains 1,000 posts for testing
      • The test set is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
    • clickbait-spoiling-21.jsonl contains the complete corpus with 5,000 clickbait posts
      • The clickbait-spoiling-21.jsonl file is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
  2. W

    Webis-Clickbait-22

    • webis.de
    • anthology.aicmu.ac.cn
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast (2022). Webis-Clickbait-22 [Dataset]. https://webis.de/data/webis-clickbait-22.html
    Explore at:
    Dataset updated
    2022
    Dataset provided by
    Friedrich Schiller University Jena
    The Web Technology & Information Systems Network
    University of Kassel, hessian.AI, and ScaDS.AI
    Authors
    Matthias Hagen; Maik Fröbe; Artur Jurk; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter. This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

  3. h

    clickbait-spoiling-data-question

    • huggingface.co
    Updated Sep 18, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pramit Sahoo (2013). clickbait-spoiling-data-question [Dataset]. https://huggingface.co/datasets/pramitsahoo/clickbait-spoiling-data-question
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2013
    Authors
    Pramit Sahoo
    Description

    Webis Clickbait Spoiling Corpus

    The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter. This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post. This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post. Additionally… See the full description on the dataset page: https://huggingface.co/datasets/pramitsahoo/clickbait-spoiling-data-question.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk (2023). Webis Clickbait Spoiling Corpus 2022 [Dataset]. http://doi.org/10.5281/zenodo.6362726
Organization logo

Webis Clickbait Spoiling Corpus 2022

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jul 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthias Hagen; Matthias Hagen; Maik Fröbe; Maik Fröbe; Artur Jurk; Martin Potthast; Martin Potthast; Artur Jurk
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Webis Clickbait Spoiling Corpus 2022

The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 5,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter.
This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.

This dataset contains the clickbait posts and manually cleaned versions of the linked documents, and extracted spoilers for each clickbait post.
Additionally, the spoilers are categorized into three types: short phrase spoilers, longer passage spoilers, and multiple non-consecutive pieces of text.

We want to organize a shared task on clickbait spoiling. Hence, we omit the 1,000 test post from this version of the dataset and will publish the test posts later.

# Overview

The dataset comes with predefined train/validation/test splits:

  • training.jsonl contains 3,200 posts for training
  • validation.jsonl contains 800 posts for validation
  • test.jsonl contains 1,000 posts for testing
    • The test set is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
  • clickbait-spoiling-21.jsonl contains the complete corpus with 5,000 clickbait posts
    • The clickbait-spoiling-21.jsonl file is ommitted from this version of the dataset since we want to organize a shared task on clickbait spoiling and for this we want to keep the test set private until the end of the shared task.
Search
Clear search
Close search
Google apps
Main menu