Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
9 datasets found
  1. Data from: Webis-Web-Archive-17

    • zenodo.org
    png, txt, zip
    Updated Oct 4, 2017
  2. Webis-Web-Archive-17 Content Error Annotations

    • zenodo.org
    • search.datacite.org
    csv
    Updated Jan 25, 2019
  3. f

    Webis-Web-Archive-17 Content Error Annotations

    • figshare.com
    • zenodo.org
    txt
    Updated Jan 4, 2020
  4. o

    Webis-Web-Archive-17 Content Error Annotations

    • explore.openaire.eu
    Updated Mar 22, 2019
  5. f

    Webis-Web-Archive-17 Content Error Annotations

    • figshare.com
    png
    Updated Feb 8, 2020
  6. Webis-Web-Errors-19

    • webis.de
    • zenodo.org
    Updated 2017
  7. Webis-WebSeg-20

    • webis.de
    • zenodo.org
    • +1more
    Updated 2020
  8. f

    Webis-Web-Segments-20

    • figshare.com
    • zenodo.org
    txt
    Updated Jun 9, 2020
  9. Webis Clickbait Corpus 2017 (Webis-Clickbait-17)

    • zenodo.org
    zip
    Updated Jun 11, 2018
  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kiesel, Johannes; Potthast, Martin; Hagen, Matthias; Kneist, Florian; Stein, Benno (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.4040710
Organization logoOrganization logoOrganization logoOrganization logo

Data from: Webis-Web-Archive-17

Related Article
zip, png, txtAvailable download formats
Dataset updated Oct 4, 2017
Dataset provided by
Bauhaus-Universität Weimarhttp://www.uni-weimar.de/
Leipzig Universityhttp://www.uni-leipzig.de/
Martin-Luther-University Halle-Wittenberghttp://www.uni-halle.de/
Ulm Universityhttp://www.uni-ulm.de/
Authors
Kiesel, Johannes; Potthast, Martin; Hagen, Matthias; Kneist, Florian; Stein, Benno
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality. See this overview for all datasets that built upon this one. If you use this dataset in your research, please cite it using this paper.

Search
Clear search
Close search
Google apps
Main menu