11 datasets found
  1. Website Statistics

    • data.wu.ac.at
    • data.europa.eu
    csv, pdf
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jun 11, 2018
    Dataset provided by
    Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

    • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

    • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

    • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

    • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

      Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

    These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

  2. Dataset Collected by JSObserver

    • zenodo.org
    zip
    Updated Jun 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng (2020). Dataset Collected by JSObserver [Dataset]. http://doi.org/10.5281/zenodo.3874944
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a sampled dataset collected by JSObserver on Alexa top 100K websites. We analyze the log files to identify JavaScript global identifier conflicts, i.e., variable value conflicts, variable type conflicts and function definition conflicts.

    We release the log files on websites where we detect the above conflicts, and split the whole dataset into 10 subsets, i.e., 1-50K-0.zip ~ 50K-100K-4.zip.

    The writes to a memory location in JavaScript are saved in [rank].[main/sub].[frame_cnt].asg (e.g., 1.main.0.asg) files.

    JavaScript global function definitions are saved in [rank].[main/sub].[frame_cnt].func (e.g., 1.main.0.func) files.

    The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.

    The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.

    We also sample 100 websites on which we did not detect any conflicts. The log files collected on those websites are available in sampled_no_conflict.zip

  3. Z

    The Klarna Product-Page Dataset

    • data.niaid.nih.gov
    • researchdata.se
    • +2more
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Risuleo, Riccardo Sven (2024). The Klarna Product-Page Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12605479
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    Moradi, Aref
    Magureanu, Stefan
    Hotti, Alexandra
    Lagergren, Jens
    Risuleo, Riccardo Sven
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Klarna Product Page Dataset is a dataset of publicly available pages corresponding to products sold online on various e-commerce websites. The dataset contains offline snapshots of 51,701 product pages collected from 8,175 distinct merchants across 8 different markets (US, GB, SE, NL, FI, NO, DE, AT) between 2018 and 2019. On each page, analysts labelled 5 elements of interest: the price of the product, its image, its name and the add-to-cart and go-to-cart buttons (if found). These labels are present in the HTML code as an attribute called klarna-ai-label taking one of the values: Price, Name, Main picture, Add to cart and Cart.

    The snapshots are available in 3 formats: as MHTML files (~24GB), as WebTraversalLibrary (WTL) snapshots (~7.4GB), and as screeshots (~8.9GB). The MHTML format is less lossy, a browser can render these pages though any Javascript on the page is lost. The WTL snapshots are produced by loading the MHTML pages into a chromium-based browser. To keep the WTL dataset compact, the screenshots of the rendered MTHML are provided separately; here we provide the HTML of the rendered DOM tree and additional page and element metadata with rendering information (bounding boxes of elements, font sizes etc.). The folder structure of the screenshot dataset is identical to the one the WTL dataset and can be used to complete the WTL snapshots with image information. For convenience, the datasets are provided with a train/test split in which no merchants in the test set are present in the training set.

    Corresponding Publication

    For more information about the contents of the datasets (statistics etc.) please refer to the following TMLR paper.

    GitHub Repository

    The code needed to re-run the experiments in the publication accompanying the dataset can be accessed here.

    Citing

    If you found this dataset useful in your research, please cite the paper as follows:

    @article{hotti2024the, title={The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models}, author={Alexandra Hotti and Riccardo Sven Risuleo and Stefan Magureanu and Aref Moradi and Jens Lagergren}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=zz6FesdDbB}, note={} }

  4. Farm Service Agency News and Events Widget

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News and Events Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-and-events-widget
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Farm Service Agencyhttps://www.fsa.usda.gov/
    United States Department of Agriculturehttp://usda.gov/
    Description

    This Widget provides access to all FSA National News releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.

  5. Z

    NBP 2202 data collection map

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rollo, Callum (2022). NBP 2202 data collection map [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6383011
    Explore at:
    Dataset updated
    Mar 25, 2022
    Dataset authored and provided by
    Rollo, Callum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Full code and dataset for the NBP 2202 map website. Data were collected during Jan-Feb 2022 in the Amundsen sea from the Nathaniel B. Palmer. This is a Python-flask app which displays data in a javascript leaflet map. The contents of this dataset should be all you need to host the website yourself, for local viewing or to make publicly available

    This upload is a copy of the GitHub repo taken on 24/03/22 with additional satellite data that was too large for git.

    The github repo can be found here https://github.com/callumrollo/itgc-2022-map/

    The website is currently maintained at https://nbp2202map.com/

    All data are publicly available. Locations and information displayed in the map are for convenience purposes only and are not authoritative. Contact the PIS of the International Thwaites Glacier Collaboration (ITGC) for full datasets. This website is the author's personal work and does not reflect the views of the ITGC group. The author has no official affiliation with ITGC.

  6. e

    COVID-19 Impact Dataset: Great British Intelligence Test, 2020 - Dataset -...

    • b2find.eudat.eu
    Updated Oct 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). COVID-19 Impact Dataset: Great British Intelligence Test, 2020 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/85806aa2-53a7-5728-8883-6be7d8f3496e
    Explore at:
    Dataset updated
    Oct 20, 2023
    Description

    There is an urgent need to understand the factors that mediate and mitigate the impact of the Covid-19 pandemic on behaviour and wellbeing. However, the onset of the outbreak was unexpected and the rate of acceleration so rapid as to preclude the planning of studies that can address these critical issues. Coincidentally, in January 2020, just prior to the outbreak in the UK, my team launched a study that collected detailed (~50 minute) cognitive and questionnaire assessments from >200,000 members of the UK public as part of a collaboration with the BBC. This placed us in a unique position to examine how aspects of mental health subsequently changed as the pandemic arrived in the UK. Therefore, we collected data from a further ~120,000 people in May, including additional detailed measures of self-perceived pandemic impact and free text descriptions of the main positives, negatives and pragmatic measures that people found helped them maintain their wellbeing. In this data archive, we include the survey data from January and May 2020 examining impact of Covid-19 on mood, wellbeing and behaviour in the UK population. This data is reported in a preprint article, where we apply a novel fusion of psychometric, multivariate and machine learning analyses to this unique dataset, in order to address some of the most pressing questions regarding wellbeing during the pandemic in a data-driven manner. The preprint is available on this URL. https://www.medrxiv.org/content/10.1101/2020.06.18.20134635v1 Recruitment Starting from December 26th 2019, participants were recruited to the study website, where they completed cognitive tests and a detailed questionnaire. Articles describing the study were placed on the BBC2 Horizon, BBC Home page, BBC News Home page and circulated on mobile news meta-apps from January 1st 2020. To maximise representativeness of the sample there were no inclusion/exclusion criteria. Analyses here exclude data from participants under 16 years old, as they completed a briefer questionnaire, and those who responded to the questionnaire unfeasibly fast (<4 minutes). Cognitive test data will be reported separately. The study was approved by the Imperial College Research Ethics Committee (17IC4009). Data collection Data were collected via our custom server system, which produces study-specific websites (https://gbws.cognitron.co.uk) on the Amazon EC2. Questionnaires and tests were programmed in Javascript and HTML5. They were deliverable via personal computers, tablets and smartphones. The questionnaire included scales quantifying sociodemographic, lifestyle, online technology use, personality, and mental health (Supplement 1). Participants could enrol for longitudinal follow up, scheduled for 3, 6 and 12 months. People returning to the site outside of these timepoints were navigated to a different URL. On May 2nd 2020, the questionnaire was augmented - in light of the Covid-19 pandemic - with an extended mood scale, and an instrument comprising 47 items quantifying self-perceived effects on mood, behaviour and outlook (Pandemic General Impact Scale PD-GIS-11). Questions regarding pre-existing psychiatric and neurological conditions, lockdown context, having the virus, and free text fields were added. This coincided with further promotion via BBC2 Horizon and BBC Homepage.

  7. 115th U.S. Congress Member Website (Full JavaScript-enabled Scrape)...

    • zenodo.org
    application/gzip, bin +1
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bob Rudis; Bob Rudis (2020). 115th U.S. Congress Member Website (Full JavaScript-enabled Scrape) Collection [Dataset]. http://doi.org/10.5281/zenodo.1219056
    Explore at:
    txt, application/gzip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bob Rudis; Bob Rudis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This data set represents a point-in-time full JavaScript-enabled scrape of all available 115th U.S. Congress member web sites. The data collection originated and completed on 2018-04-13 and the results are in ndjson/jsonlines/streaming JSON format. File format information is in the enclosed README.md file.

    The data was used to evaluate the privacy profiles of each U.S. Congress members' official (.gov hosted) websites for the discussion in <https://rud.is/b/2018/04/13/does-congress-really-care-about-your-privacy/>.

    ScrapingHub's "Splash" platform (<https://github.com/scrapinghub/splash>) was used along with the "splashr" R package (<https://github.com/hrbrmstr/splashr>) to retrieve the content.

  8. h

    js-free-sites

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maxine, js-free-sites [Dataset]. https://huggingface.co/datasets/crumb/js-free-sites
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    maxine
    Description

    It is what it says on the tin, 2k websites that don't run javascript (aka: probably really old, and simple!). Filtering needs to be done to take out the defunct sites and error messages. I don't expect this is a hard task.

  9. d

    B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts &...

    • datarade.ai
    .json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PredictLeads, B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts & Jobs | 921M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-b2b-data-technographic-data-api-flat-file-predictleads
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    PredictLeads
    Area covered
    Ascension and Tristan da Cunha, Uzbekistan, Saudi Arabia, Cook Islands, Mongolia, Nauru, Hong Kong, Costa Rica, Lithuania, Vanuatu
    Description

    PredictLeads Global Technographic Dataset delivers in-depth insights into technology adoption across millions of companies worldwide. Our dataset, sourced from HTML, JavaScript, and job postings, enables B2B sales, marketing, and data enrichment teams to refine targeting, enhance lead scoring, and optimize outreach strategies. By tracking 25,000+ technologies across 92M+ websites, businesses can uncover market trends, assess competitor technology stacks, and personalize their approach.

    Use Cases:

    βœ… Enhance CRM Data – Enrich company records with detailed real-time technology insights. βœ… Targeted Sales Outreach – Identify prospects based on their tech stack and personalize outreach. βœ… Competitor & Market Analysis – Gain insights into competitor technology adoption and industry trends. βœ… Lead Scoring & Prioritization – Rank potential customers based on adopted technologies. βœ… Personalized Marketing – Craft highly relevant campaigns based on technology adoption trends.

    API Attributes & Structure:

    • id (string, UUID) – Unique identifier for each technology detection.
    • first_seen_at (ISO 8601 date-time) – Timestamp when the technology was first detected on the company's website.
    • last_seen_at (ISO 8601 date-time) – Most recent timestamp when the technology was last observed.
    • behind_firewall (boolean) – Indicates whether the technology is protected behind a firewall.
    • score (float, 0–1) – Confidence score for the detection accuracy.
    • company (object) – The company using the detected technology, including:
    • - id (UUID) – Unique company identifier.
    • - domain (string) – Company website domain.
    • - company_name (string) – Company's official name.
    • - ticker (string, nullable) – Stock ticker (if publicly listed).
    • technology (object) – Information on the detected technology, including:
    • - id (UUID) – Unique technology identifier.
    • - name (string) – Technology name (e.g., Salesforce, HubSpot, AWS).
    • seen_on_job_openings (boolean) – True/False flag indicating if the technology is - mentioned in job postings.
    • seen_on_subpages (array of objects) – List of company subpages where the technology was detected.

    πŸ“Œ PredictLeads Technographic Data is trusted by enterprises and B2B professionals for accurate, real-time technology intelligence, enabling smarter prospecting, data-driven marketing, and competitive analysis

    PredictLeads Technology Detections Dataset https://docs.predictleads.com/v3/guide/technology_detections_dataset

  10. w

    Farm Service Agency Market News Widget

    • data.wu.ac.at
    • agdatacommons.nal.usda.gov
    • +2more
    html
    Updated Feb 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Agriculture (2014). Farm Service Agency Market News Widget [Dataset]. https://data.wu.ac.at/schema/data_gov/Njg3ZmZmOGYtNDY4OS00YTM5LWFkM2MtMzFlZDU4NjVjZTM0
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 27, 2014
    Dataset provided by
    Department of Agriculture
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    1b9ba8d7a8e390301d6c89c9aa1433cbf13dc03a
    Description

    This Widget provides access to all FSA Daily Terminal Market Prices information releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.

  11. R

    Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...

    • data.ru.nl
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gunes Acar; Senol, A. (2023). Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study (WPES'23) [Dataset]. http://doi.org/10.34973/86ks-gf89
    Explore at:
    (21570613845 bytes)Available download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Radboud University
    Authors
    Gunes Acar; Senol, A.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.

    By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Organization logo

Website Statistics

Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

  • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

  • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

  • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

  • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

    Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

Search
Clear search
Close search
Google apps
Main menu