Updated Date
Download Format
Usage Rights
License from Data Provider
Please review the applicable license to make sure your contemplated use is permitted.
Cost to Access
Described as free to access or have a license that allows redistribution.
6 datasets found
  1. Webis-Web-Archive-17

    • webis.de
    • zenodo.org
    Updated 2017
  2. Webis-Web-Archive-17 Content Error Annotations

    • zenodo.org
    • figshare.com
    Updated Apr 15, 2019
  3. f

    Webis-Web-Archive-17 Content Error Annotations

    • figshare.com
    • zenodo.org
    Updated Jan 4, 2020
  4. Webis-Web-Archive-17 Content Error Annotations

    • zenodo.org
    • search.datacite.org
    Updated Jan 25, 2019
  5. Webis-Clickbait-17

    • webis.de
    Updated 2017
  6. Webis Clickbait Corpus 2017 (Webis-Clickbait-17)

    • zenodo.org
    Updated Jun 11, 2018
  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Click to copy link
Link copied


  • Dataset updated 2017
Dataset provided by
Bauhaus University, Weimarhttp://www.uni-weimar.de/
The Web Technology & Information Systems Network
Kiesel, Johannes; Hagen, Matthias; Stein, Benno; Kneist, Florian; Potthast, Martin

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically


The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017. The original Webis-Web-Archive-17 dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as annotations per web page on how well the web page can be reduced from the archive. Later on, the dataset was extended with annotations of content errors.

Clear search
Close search
Google apps
Main menu