1 dataset found

W
Data from: Webis-Web-Archive-17
anthology.aicmu.ac.cn
webis.de
+1more
1002203
Updated 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203
Explore at:
1002203Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.1002203
Dataset updated
2017
Dataset provided by
Bauhaus-Universität Weimar
Leipzig University
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
Authors
Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203

Data from: Webis-Web-Archive-17

Explore at:

1002203Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.1002203

Dataset updated

2017

Dataset provided by

Bauhaus-Universität Weimar
Leipzig University
The Web Technology & Information Systems Network
Friedrich Schiller University Jena

Authors

Johannes Kiesel; Florian Kneist; Matthias Hagen; Benno Stein

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.

Clear search

Close search

Google apps

Main menu