8 datasets found
  1. W

    Data from: Webis-Web-Archive-17

    • webis.de
    • data.niaid.nih.gov
    • +1more
    1002203
    Updated 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203
    Explore at:
    1002203Available download formats
    Dataset updated
    2017
    Dataset provided by
    GESIS - Leibniz Institute for the Social Sciences
    University of Kassel, hessian.AI, and ScaDS.AI
    Friedrich Schiller University Jena
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.

  2. W

    Webis-Web-Archive-Quality-22

    • webis.de
    6881334
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Elstner; Johannes Kiesel; Sebastian Schmidt; Martin Potthast; Benno Stein (2022). Webis-Web-Archive-Quality-22 [Dataset]. http://doi.org/10.5281/zenodo.6881334
    Explore at:
    6881334Available download formats
    Dataset updated
    2022
    Dataset provided by
    Enginsight GmbH
    GESIS - Leibniz Institute for the Social Sciences
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    University of Kassel
    Bauhaus-Universität Weimar
    Authors
    Theresa Elstner; Johannes Kiesel; Sebastian Schmidt; Martin Potthast; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Archive-Quality-22 comprises a total of 6,500 pairs of screenshots from web pages as they were archived and as they were reproduced from that archive, along with archive quality annotations and information of DOM elements on the screenshot.

  3. W

    Webis-Web-Errors-19

    • webis.de
    • data.niaid.nih.gov
    2549837
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist (2019). Webis-Web-Errors-19 [Dataset]. http://doi.org/10.5281/zenodo.2549837
    Explore at:
    2549837Available download formats
    Dataset updated
    2019
    Dataset provided by
    Friedrich Schiller University Jena
    GESIS - Leibniz Institute for the Social Sciences
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Web-Errors-19 comprises various annotations for the 10,000 web page archives of the Webis-Web-Archive-17. The annotations are whether the page is (1) mostly advertisement, (2) cut off, (3) still loading, (4) pornographic; and whether it shows (not/a bit/ very) (5) pop-ups, (6) CAPTCHAs, or (7) error messages.

  4. Webis-Web-Archive-17 Content Error Annotations

    • zenodo.org
    csv
    Updated Sep 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein (2020). Webis-Web-Archive-17 Content Error Annotations [Dataset]. http://doi.org/10.5281/zenodo.2549838
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Kiesel; Johannes Kiesel; Fabienne Hubricht; Benno Stein; Martin Potthast; Martin Potthast; Fabienne Hubricht; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotations of content errors in the Webis-Web-Archive-17.

    Described in more detail in an upcoming publication.

  5. W

    Webis-WebSeg-20

    • webis.de
    3354902
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kiesel; Lars Meyer; Benno Stein; Martin Potthast (2020). Webis-WebSeg-20 [Dataset]. http://doi.org/10.5281/zenodo.3354902
    Explore at:
    3354902Available download formats
    Dataset updated
    2020
    Dataset provided by
    Enginsight GmbH
    GESIS - Leibniz Institute for the Social Sciences
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Johannes Kiesel; Lars Meyer; Benno Stein; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-WebSeg-20 dataset comprises 42,450 crowdsourced segmentations for 8,490 web pages from the Webis-Web-Archive-17. Segmentations were fused from the segmentations of five crowd workers each.

  6. Z

    Webis-Web-Segments-20

    • data.niaid.nih.gov
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiesel, Johannes; Kneist, Florian; Meyer, Lars; Komlossy, Kristof; Stein, Benno; Potthast, Martin (2023). Webis-Web-Segments-20 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3354902
    Explore at:
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Leipzig University
    Bauhaus-Universität Weimar
    Authors
    Kiesel, Johannes; Kneist, Florian; Meyer, Lars; Komlossy, Kristof; Stein, Benno; Potthast, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of crowdsourced annotations for web page segmentations.

    Web pages are taken from the webis-web-archive-17.

  7. d

    geohist.ca website files/fichiers du site web geohist.ca

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fortin, Marcel (2023). geohist.ca website files/fichiers du site web geohist.ca [Dataset]. http://doi.org/10.5683/SP2/OWEBOJ
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Fortin, Marcel
    Description

    Archive of the Geohistory/Géohistoire website and related files. Captured July 17, 2020.

  8. bl-archive.net Website Traffic, Ranking, Analytics [November 2025]

    • semrush.ebundletools.com
    Updated Dec 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). bl-archive.net Website Traffic, Ranking, Analytics [November 2025] [Dataset]. https://semrush.ebundletools.com/website/bl-archive.net/overview/
    Explore at:
    Dataset updated
    Dec 13, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Dec 13, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    bl-archive.net is ranked #1218 in JP with 2.74M Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist (2017). Webis-Web-Archive-17 [Dataset]. http://doi.org/10.5281/zenodo.1002203

Data from: Webis-Web-Archive-17

Related Article
Explore at:
1002203Available download formats
Dataset updated
2017
Dataset provided by
GESIS - Leibniz Institute for the Social Sciences
University of Kassel, hessian.AI, and ScaDS.AI
Friedrich Schiller University Jena
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein; Florian Kneist
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality.

Search
Clear search
Close search
Google apps
Main menu