Updated Date
Download Format
Usage Rights
License from Data Provider
Please review the applicable license to make sure your contemplated use is permitted.
Cost to Access
Described as free to access or have a license that allows redistribution.
2 datasets found
  1. Webis-TRC-12

    • webis.de
    • figshare.com
    • +2more
    Updated Sep 18, 2012
  2. m

    Data for: Evaluation of Ride-Sourcing Search Frictions and Driver...

    • data.mendeley.com
    Updated Dec 15, 2019
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Click to copy link
Link copied


17 scholarly articles cite this dataset (View in Google Scholar)
  • Dataset updated Sep 18, 2012
Dataset provided by
Bauhaus University, Weimarhttp://www.uni-weimar.de/
The Web Technology & Information Systems Network
Michael Völske; Potthast, Martin; Stein, Benno; Hagen, Matthias

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Available download formats from providers
1341602, html

The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.

Clear search
Close search
Google apps
Main menu