Search
Clear search
Close search
Main menu
Google apps
1 dataset found
  1. W

    Data from: Webis-Web-Archive-17

    • anthology.aicmu.ac.cn
    • webis.de
    • +2more
    1002203
    Updated 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203
    Explore at:
    1002203Available download formats
    Dataset updated
    2017
    Dataset provided by
    The Web Technology & Information Systems Network
    Leipzig University
    Bauhaus-Universität Weimar
    Friedrich Schiller University Jena
    Authors
    Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203

Data from: Webis-Web-Archive-17

Related Article
Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
1002203Available download formats
Dataset updated
2017
Dataset provided by
The Web Technology & Information Systems Network
Leipzig University
Bauhaus-Universität Weimar
Friedrich Schiller University Jena
Authors
Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.