Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
3 datasets found
  1. Wikipedia Text Reuse Corpus

    • webis.de
    Updated 2018
  2. Webis Wikipedia Text Reuse Corpus 2018 (Webis-Wikipedia-Text-Reuse-18)

    • zenodo.org
    gz, zip
    Updated Jul 5, 2018
  3. Webis Wikipedia Text Reuse Corpus 2018 (Webis-Wikipedia-Text-Reuse-18)

    • zenodo.org
    bz2, xz
    Updated Jul 5, 2018
  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alshomary, Milad; Völske, Michael; Wachsmuth, Henning; Stein, Benno; Hagen, Matthias; Potthast, Martin (2018). Wikipedia Text Reuse Corpus [Dataset]. http://doi.org/10.5281/zenodo.3546193
Organization logoOrganization logoOrganization logoOrganization logo

Wikipedia Text Reuse Corpus

2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated 2018
Dataset provided by
Martin-Luther-University Halle-Wittenberghttp://www.uni-halle.de/
Leipzig Universityhttp://www.uni-leipzig.de/
Paderborn Universityhttp://www.uni-paderborn.de/
Bauhaus-Universität Weimarhttp://www.uni-weimar.de/
The Web Technology & Information Systems Network
Authors
Alshomary, Milad; Völske, Michael; Wachsmuth, Henning; Stein, Benno; Hagen, Matthias; Potthast, Martin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A cropus of text reuse cases extracted from within Wikipedia and in between Wikipedia and a sample of Common Crawl

Search
Clear search
Close search
Google apps
Main menu