2 datasets found

W
Webis-TRC-12
webis.de
anthology.aicmu.ac.cn
1341602
Updated 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Matthias Hagen; Michael Völske; Benno Stein (2012). Webis-TRC-12 [Dataset]. http://doi.org/10.5281/zenodo.1341602
Explore at:
1341602Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.1341602
Dataset updated
2012
Dataset provided by
Friedrich Schiller University Jena
University of Kassel, hessian.AI, and ScaDS.AI
Artefact Germany, Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Martin Potthast; Matthias Hagen; Michael Völske; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009?2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.
E
Webis Text Reuse Corpus 2012
live.european-language-grid.eu
zenodo.org
html
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Webis Text Reuse Corpus 2012 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7425
Explore at:
htmlAvailable download formats
Dataset updated
May 16, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Martin Potthast; Matthias Hagen; Michael Völske; Benno Stein (2012). Webis-TRC-12 [Dataset]. http://doi.org/10.5281/zenodo.1341602

Webis-TRC-12

Explore at:

24 scholarly articles cite this dataset (View in Google Scholar)

1341602Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.1341602

Dataset updated

2012

Dataset provided by

Friedrich Schiller University Jena
University of Kassel, hessian.AI, and ScaDS.AI
Artefact Germany, Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar

Authors

Martin Potthast; Matthias Hagen; Michael Völske; Benno Stein

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009?2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.