Saved datasets
1 dataset found
  1. o

    Webis Wikipedia-IPC

    • explore.openaire.eu
    Updated Feb 8, 2023
  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marcel Gohsen; Matthias Hagen; Martin Potthast; Benno Stein (2023). Webis Wikipedia-IPC [Dataset]. http://doi.org/10.5281/zenodo.7621320

Webis Wikipedia-IPC

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 8, 2023
Authors
Marcel Gohsen; Matthias Hagen; Martin Potthast; Benno Stein
Description

Webis Wikipedia-IPC When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of this idea, we analyzed captions in the English Wikipedia, where editors frequently relabel the same image for different articles. As a result, the Wikipedia-IPC (Image caption Paraphrase Corpus) dataset was created which include caption pairs of the same image which represent paraphrases. It contains 30,237 gold, 229,877 silver, and 656,560 bronze quality paraphrase pairs. Bronze quality will be released soon.

Search
Clear search
Close search
Google apps
Main menu