1 dataset found
  1. o

    Data from: Webis-Web-Archive-17

    • explore.openaire.eu
    • webis.de
    • +3more
    Updated Oct 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Florian Kneist; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.4040710
    Explore at:
    Dataset updated
    Oct 4, 2017
    Authors
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Florian Kneist; Benno Stein
    Description

    The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality. See this overview for all datasets that built upon this one. If you use this dataset in your research, please cite it using this paper.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johannes Kiesel; Martin Potthast; Matthias Hagen; Florian Kneist; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.4040710

Data from: Webis-Web-Archive-17

Related Article
Explore at:
24 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 4, 2017
Authors
Johannes Kiesel; Martin Potthast; Matthias Hagen; Florian Kneist; Benno Stein
Description

The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality. See this overview for all datasets that built upon this one. If you use this dataset in your research, please cite it using this paper.

Search
Clear search
Close search
Google apps
Main menu