2 datasets found
  1. Z

    PAN12 Originality: Source Retrieval

    • data.niaid.nih.gov
    Updated Jun 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Potthast, Martin; Gollub, Tim; Hagen, Matthias; Graßegger, Jan; Kiesel, Johannes; Michel, Maximilian; Oberländer, Arnd; Tippmann, Martin; Barrón-Cedeño, Alberto; Gupta, Parth; Rosso, Paolo; Stein, Benno (2022). PAN12 Originality: Source Retrieval [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3713287
    Explore at:
    Dataset updated
    Jun 11, 2022
    Dataset provided by
    Universität Leipzig
    Martin-Luther-Universität Halle-Wittenberg
    Bauhaus-Universität Weimar
    Authors
    Potthast, Martin; Gollub, Tim; Hagen, Matthias; Graßegger, Jan; Kiesel, Johannes; Michel, Maximilian; Oberländer, Arnd; Tippmann, Martin; Barrón-Cedeño, Alberto; Gupta, Parth; Rosso, Paolo; Stein, Benno
    Description

    We provide you with a training corpus that consists of suspicious documents. Each suspicious document is about a specific topic and may consist of plagiarized passages obtained from web pages on that topic found in the ClueWeb09 corpus.

  2. PAN 2026: Generative Plagiarism Detection Test Dataset

    • zenodo.org
    Updated Mar 16, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    André Greiner-Petter; Yun-Ang Wu; Yuka Kitamura; Kon Woo Kim; Bela Gipp; André Greiner-Petter; Yun-Ang Wu; Yuka Kitamura; Kon Woo Kim; Bela Gipp (2026). PAN 2026: Generative Plagiarism Detection Test Dataset [Dataset]. http://doi.org/10.5281/zenodo.19038846
    Explore at:
    Dataset updated
    Mar 16, 2026
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    André Greiner-Petter; Yun-Ang Wu; Yuka Kitamura; Kon Woo Kim; Bela Gipp; André Greiner-Petter; Yun-Ang Wu; Yuka Kitamura; Kon Woo Kim; Bela Gipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Given a suspicious document and a collection of potential source documents, your task is to retrieve the source documents in the collection that the suspicious document plagiarizes (similar to the Source Retrieval Task at PAN 2012 to 2015).

    A detailed description of the task is available at the homepage of the task.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Potthast, Martin; Gollub, Tim; Hagen, Matthias; Graßegger, Jan; Kiesel, Johannes; Michel, Maximilian; Oberländer, Arnd; Tippmann, Martin; Barrón-Cedeño, Alberto; Gupta, Parth; Rosso, Paolo; Stein, Benno (2022). PAN12 Originality: Source Retrieval [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3713287

PAN12 Originality: Source Retrieval

Explore at:
Dataset updated
Jun 11, 2022
Dataset provided by
Universität Leipzig
Martin-Luther-Universität Halle-Wittenberg
Bauhaus-Universität Weimar
Authors
Potthast, Martin; Gollub, Tim; Hagen, Matthias; Graßegger, Jan; Kiesel, Johannes; Michel, Maximilian; Oberländer, Arnd; Tippmann, Martin; Barrón-Cedeño, Alberto; Gupta, Parth; Rosso, Paolo; Stein, Benno
Description

We provide you with a training corpus that consists of suspicious documents. Each suspicious document is about a specific topic and may consist of plagiarized passages obtained from web pages on that topic found in the ClueWeb09 corpus.

Search
Clear search
Close search
Google apps
Main menu