Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. Webis-TRC-12

    • webis.de
    • figshare.com
    • +2more
    Updated Sep 18, 2012
  2. Webis-TLDR-17

    • webis.de
    Updated 2017
  3. Webis-CLS-10

    • webis.de
    Updated 2010
  4. Webis-KIQC-13

    • webis.de
    • zenodo.org
    Updated 2013
  5. webis-argument-framing-19

    • webis.de
    • figshare.com
    • +2more
    csv
    Updated 2019
  6. Webis-QSeC-10

    • webis.de
    Updated 2010
  7. Data from: Webis-Web-Archive-17

    • webis.de
    • zenodo.org
    Updated 2017
  8. Webis-Clickbait-17

    • webis.de
    Updated 2017
  9. Webis-WebSeg-20

    • webis.de
    • zenodo.org
    • +1more
    Updated 2020
  10. Webis-Editorials-16

    • webis.de
    • figshare.com
    • +1more
    Updated 2016
  11. Webis-EditorialSum-20

    • webis.de
    Updated Sep 3, 2020
  12. Webis-QTM-19

    • webis.de
    Updated May 20, 2019
  13. Webis-Editorial-Quality-18

    • webis.de
    Updated 2018
  14. Webis-Ambient-15

    • webis.de
    • zenodo.org
    Updated 2015
  15. Webis-Clickbait-16

    • webis.de
    • zenodo.org
    Updated 2016
  16. Webis-PC-08

    • webis.de
    Updated 2008
  17. Webis-Debate-16

    • webis.de
    • figshare.com
    • +2more
    Updated 2016
  18. Webis-QSpell-17

    • webis.de
    • zenodo.org
    Updated 2017
  19. Webis-ODP-10

    • webis.de
    Updated 2010
  20. Webis-Snippet-20

    • webis.de
  21. Webis-Tripad-14

    • webis.de
    Updated 2014
  22. Webis-Tripad-13-Sentiment

    • webis.de
    Updated 2013
  23. Webis-Bias-Flipper-18

    • webis.de
    • zenodo.org
    Updated 2018
  24. webis-comparative-web-search-questions-20

    • webis.de
    Updated 2020
  25. Webis-Revenue-10

    • webis.de
    • zenodo.org
    xml
  26. Webis-WikiDebate-18

    • webis.de
    • zenodo.org
    • +1more
    tsv
    Updated 2018
  27. Webis-Mnemonics-17

    • webis.de
    • zenodo.org
    Updated 2017
  28. Webis-CPC-11

    • webis.de
    txt
    Updated 2011
  29. LFA-11

    • webis.de
    xmi
    Updated 2011
  30. Webis-SMC-12

    • webis.de
    • zenodo.org
    Updated 2012
  31. Webis-Sentences-17

    • webis.de
    Updated Feb 27, 2017
  32. Webis-SDMbridge-12

    • webis.de
    Updated 2012
  33. PAN-WVC-10

    • webis.de
    Updated 2010
  34. g

    Fluxnet: Archived Website Including Site and Investigator Information

    • gis.csiss.gmu.edu
    • cmr.earthdata.nasa.gov
    • +3more
    Updated Jan 1, 1991
  35. Webis-PRA-12

    • webis.de
    Updated 2012
  36. args.me corpus

    • webis.de
    • zenodo.org
    json
    Updated 2019
  37. PAN-WQF-12

    • webis.de
    Updated 2012
  38. *name of the dataset

    • assets.webis.de
  39. Wikipedia Text Reuse Corpus

    • webis.de
    Updated 2018
  40. Taxonomy for NIST Website

    • catalog.data.gov
    Updated Apr 24, 2020
  41. Education Statistics

    • datacatalog.worldbank.org
    csv zip, excel zip
    Updated Mar 13, 2020
  42. PAN-WVC-11

    • webis.de
    Updated 2011
  43. Webis-WikiDiscussions-18

    • zenodo.org
    • webis.de
    • +1more
    gz
    Updated Jul 17, 2018
  44. Genre-KI-04

    • webis.de
    • zenodo.org
    Updated 2004
  45. Data from: AidFlows

    • datacatalog.worldbank.org
    Updated Aug 20, 2013
  46. Website updates

    • data.nasa.gov
    • catalog.data.gov
    application/rssxml +5
    Updated Jun 26, 2018
  47. T

    reddit

    • www.tensorflow.org
    • tensorflow.google.cn
    Updated Oct 9, 2020
  48. BuzzFeed-Webis Fake News Corpus 16

    • webis.de
    1181813
    Updated Feb 20, 2018
  49. Enterprises with an online catalogue on their website in France 2018, by...

    • www.statista.com
    Updated Mar 20, 2020
  50. O

    Data from: Website Analytics

    • data.brla.gov
    • catalog.data.gov
    • +2more
    tsv +5
    Updated Nov 28, 2020
  51. O

    TS WEBSITE AND SOCIAL MEDIA ANALYTICS NOVEMBER 2016

    • www.data.va.gov
    • catalog.data.gov
    • +1more
    tsv +5
    Updated Sep 12, 2019
  52. d

    Urban Development

    • catalogue.data.govt.nz
    r, .csv
    Updated Oct 14, 2020
  53. Doing Business

    • datacatalog.worldbank.org
    csv zip, excel zip
    Updated Nov 22, 2019
  54. WorldPop Global Project Population Data: Estimated Residential Population...

    • developers.google.com
  55. Vehicle Inspection Website

    • catalogue.data.govt.nz
    Updated May 1, 2012
  56. Agency Website, opm.gov

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Jun 17, 2020
  57. d

    Products Catalog - from E-commerce Retail site NewChic.com

    • data.world
    zip, csv
    Updated Nov 24, 2020
  58. d

    NOAA Shoreline Website

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Jun 20, 2008
  59. d

    HC Website

    • catalog.data.gov
    • data.wu.ac.at
    Updated Apr 28, 2015
  60. d

    Office of Electricity website

    • catalog.data.gov
    Updated Oct 21, 2019
  61. d

    Utah aeromagnetic and gravity maps and data, a website for distribution of...

    • catalog.data.gov
    Updated Oct 24, 2013
  62. e

    Labor Market Panel Survey, ELMPS 2006 - Egypt

    • www.erfdataportal.com
    Updated May 2, 2018
  63. e

    Labor Market Panel Survey, ELMPS 1998 - Egypt

    • www.erfdataportal.com
    Updated Oct 30, 2014
  64. O

    NVSBE Website - Homepage

    • www.data.va.gov
    • catalog.data.gov
    • +2more
    tsv +5
    Updated Sep 12, 2019
  65. Temporary Trade Barriers Database including the Global Antidumping Database

    • datacatalog.worldbank.org
    excel, stata +1
    Updated May 31, 2012
  66. d

    Office of Nuclear Energy website

    • catalog-next.data.gov
    Updated Nov 10, 2020
  67. Global Financial Development

    • datacatalog.worldbank.org
    csv zip, excel zip
    Updated Oct 30, 2019
  68. Data from: Worldwide Governance Indicators

    • datacatalog.worldbank.org
    csv zip, stata +1
    Updated Sep 28, 2020
  69. Feds Hire Vets (FHV) Website

    • catalog.data.gov
    Updated Nov 20, 2018
  70. E

    Maltese-English website parallel corpus (Processed)

    • catalogue.elra.info
    Updated Feb 27, 2020
  71. STEP Skills Measurement Household Survey 2012 (Wave 1) - Vietnam

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Mar 21, 2016
  72. Emergency Management Basemap

    • catalogue.data.govt.nz
    web map viewer +2
    Updated Jan 9, 2020
  73. VA FOIA Website

    • catalog-next.data.gov
    • www.data.va.gov
    • +1more
    Updated Nov 10, 2020
  74. Other files used on Website

    • data.nasa.gov
    • catalog.data.gov
    xml, tsv +4
    Updated Jun 26, 2018
  75. Data from: Enterprise Surveys

    • datacatalog.worldbank.org
    • data.world
    Updated Jan 30, 2017
  76. MOD13Q1.006 Terra Vegetation Indices 16-Day Global 250m

    • developers.google.com
    Updated Mar 20, 2007
  77. STEP Skills Measurement Household Survey 2012 (Wave 1) - Colombia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 8, 2016
  78. Data from: Archived SAFARI 2000 Project Website, October 2008

    • catalog-next.data.gov
    • cmr.earthdata.nasa.gov
    • +1more
    Updated Nov 12, 2020
  79. Companies with a catalog on their website in France 2013, by sector

    • static1.statista.com
    Updated Jan 28, 2019
  80. E

    Bilingual hr-en parallel corpus from Croatian Mine Action website...

    • catalogue.elra.info
    Updated Feb 27, 2020
  81. d

    Code for South Atlantic LCC website project page

    • catalog-next.data.gov
    Updated Nov 12, 2020
  82. d

    2020 Open Data Plan: Website Data

    • catalog-next.data.gov
    • data.cityofnewyork.us
    Updated Nov 10, 2020
  83. C

    Performance Metrics - Innovation & Technology - City Website Availability

    • data.cityofchicago.org
    • datadiscoverystudio.org
    • +3more
    tsv +5
    Updated Sep 27, 2011
  84. Health Nutrition and Population Statistics

    • datacatalog.worldbank.org
    • knoema.com
    csv zip, excel zip
    Updated Oct 8, 2020
  85. w

    Household Risk and Vulnerability Survey 2016, Wave 1 - Nepal

    • microdata.worldbank.org
    Updated Oct 5, 2017
  86. Education Outcomes National Panel Survey (NPS) 2002-2008 - Mozambique

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Sep 26, 2013
  87. e

    Labor Market Panel Survey, JLMPS 2016 - Jordan

    • www.erfdataportal.com
    Updated May 3, 2018
  88. Wealth Accounting

    • datacatalog.worldbank.org
    excel, csv zip +1
    Updated Jan 30, 2018
  89. D

    Integrated Site Information System(ISIS) Website - (Cleanup Site &...

    • data.wa.gov
    • catalog.data.gov
    xml, tsv +4
    Updated Mar 6, 2013
  90. D

    Data from: Website Analytics

    • data.nola.gov
    • catalog-next.data.gov
    • +2more
    application/rssxml +5
    Updated Feb 2, 2017
  91. Global Economic Monitor

    • datacatalog.worldbank.org
    zip
    Updated Nov 14, 2019
  92. E

    Bilingual hr-en parallel corpus from Croatian National Bank website...

    • catalogue.elra.info
    Updated Feb 27, 2020
  93. w

    Schooling, Income, and Health Risk Impact Evaluation Household Survey...

    • microdata.worldbank.org
    Updated Sep 16, 2015
  94. The Canadian Climate Data and Scenarios (CCDS) website. An interface for...

    • open.canada.ca
    pdf
    Updated Jul 6, 2017
  95. ASIC - Company Dataset

    • data.gov.au
    pdf, csv
    Updated Nov 16, 2020
  96. e

    Labor Market Panel Survey, TLMPS 2014 - Tunisia

    • www.erfdataportal.com
    Updated May 2, 2018
  97. O

    City of West Hollywood Catalogue of Enterprise Systems

    • data.weho.org
    • data.world
    xml +5
    Updated Jan 13, 2020
  98. C

    Performance Metrics - Innovation & Technology - Map Chicago Website...

    • data.cityofchicago.org
    • datadiscoverystudio.org
    • +3more
    application/rssxml +5
    Updated Sep 27, 2011
  99. Website Statistics - Data.NSW

    • data.nsw.gov.au
    • data.wu.ac.at
    website link
    Updated Sep 30, 2015
  100. D

    Medical Examiner Case Archive

    • datacatalog.cookcountyil.gov
    • catalog-next.data.gov
    • +1more
    xml +5
    Updated Nov 29, 2020
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Potthast, Martin; Hagen, Matthias; Michael Völske; Stein, Benno (2012) Webis-TRC-12. [Dataset] http://doi.org/10.5281/zenodo.1341602
Organization logo

Webis-TRC-12

23 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated Sep 18, 2012
Dataset provided by
Bauhaus-Universität Weimarhttps://www.uni-weimar.de/
The Web Technology & Information Systems Network
Authors
Potthast, Martin; Hagen, Matthias; Michael Völske; Stein, Benno
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.

Search
Clear search
Close search
Google apps
Main menu