Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a sampled dataset collected by JSObserver on Alexa top 100K websites. We analyze the log files to identify JavaScript global identifier conflicts, i.e., variable value conflicts, variable type conflicts and function definition conflicts.
We release the log files on websites where we detect the above conflicts, and split the whole dataset into 10 subsets, i.e., 1-50K-0.zip ~ 50K-100K-4.zip.
The writes to a memory location in JavaScript are saved in [rank].[main/sub].[frame_cnt].asg (e.g., 1.main.0.asg) files.
JavaScript global function definitions are saved in [rank].[main/sub].[frame_cnt].func (e.g., 1.main.0.func) files.
The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.
The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.
We also sample 100 websites on which we did not detect any conflicts. The log files collected on those websites are available in sampled_no_conflict.zip
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Klarna Product Page Dataset is a dataset of publicly available pages corresponding to products sold online on various e-commerce websites. The dataset contains offline snapshots of 51,701 product pages collected from 8,175 distinct merchants across 8 different markets (US, GB, SE, NL, FI, NO, DE, AT) between 2018 and 2019. On each page, analysts labelled 5 elements of interest: the price of the product, its image, its name and the add-to-cart and go-to-cart buttons (if found). These labels are present in the HTML code as an attribute called klarna-ai-label taking one of the values: Price, Name, Main picture, Add to cart and Cart.
The snapshots are available in 3 formats: as MHTML files (~24GB), as WebTraversalLibrary (WTL) snapshots (~7.4GB), and as screeshots (~8.9GB). The MHTML format is less lossy, a browser can render these pages though any Javascript on the page is lost. The WTL snapshots are produced by loading the MHTML pages into a chromium-based browser. To keep the WTL dataset compact, the screenshots of the rendered MTHML are provided separately; here we provide the HTML of the rendered DOM tree and additional page and element metadata with rendering information (bounding boxes of elements, font sizes etc.). The folder structure of the screenshot dataset is identical to the one the WTL dataset and can be used to complete the WTL snapshots with image information. For convenience, the datasets are provided with a train/test split in which no merchants in the test set are present in the training set.
Corresponding Publication
For more information about the contents of the datasets (statistics etc.) please refer to the following TMLR paper.
GitHub Repository
The code needed to re-run the experiments in the publication accompanying the dataset can be accessed here.
Citing
If you found this dataset useful in your research, please cite the paper as follows:
@article{hotti2024the, title={The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models}, author={Alexandra Hotti and Riccardo Sven Risuleo and Stefan Magureanu and Aref Moradi and Jens Lagergren}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=zz6FesdDbB}, note={} }
This Widget provides access to all FSA National News releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full code and dataset for the NBP 2202 map website. Data were collected during Jan-Feb 2022 in the Amundsen sea from the Nathaniel B. Palmer. This is a Python-flask app which displays data in a javascript leaflet map. The contents of this dataset should be all you need to host the website yourself, for local viewing or to make publicly available
This upload is a copy of the GitHub repo taken on 24/03/22 with additional satellite data that was too large for git.
The github repo can be found here https://github.com/callumrollo/itgc-2022-map/
The website is currently maintained at https://nbp2202map.com/
All data are publicly available. Locations and information displayed in the map are for convenience purposes only and are not authoritative. Contact the PIS of the International Thwaites Glacier Collaboration (ITGC) for full datasets. This website is the author's personal work and does not reflect the views of the ITGC group. The author has no official affiliation with ITGC.
There is an urgent need to understand the factors that mediate and mitigate the impact of the Covid-19 pandemic on behaviour and wellbeing. However, the onset of the outbreak was unexpected and the rate of acceleration so rapid as to preclude the planning of studies that can address these critical issues. Coincidentally, in January 2020, just prior to the outbreak in the UK, my team launched a study that collected detailed (~50 minute) cognitive and questionnaire assessments from >200,000 members of the UK public as part of a collaboration with the BBC. This placed us in a unique position to examine how aspects of mental health subsequently changed as the pandemic arrived in the UK. Therefore, we collected data from a further ~120,000 people in May, including additional detailed measures of self-perceived pandemic impact and free text descriptions of the main positives, negatives and pragmatic measures that people found helped them maintain their wellbeing. In this data archive, we include the survey data from January and May 2020 examining impact of Covid-19 on mood, wellbeing and behaviour in the UK population. This data is reported in a preprint article, where we apply a novel fusion of psychometric, multivariate and machine learning analyses to this unique dataset, in order to address some of the most pressing questions regarding wellbeing during the pandemic in a data-driven manner. The preprint is available on this URL. https://www.medrxiv.org/content/10.1101/2020.06.18.20134635v1 Recruitment Starting from December 26th 2019, participants were recruited to the study website, where they completed cognitive tests and a detailed questionnaire. Articles describing the study were placed on the BBC2 Horizon, BBC Home page, BBC News Home page and circulated on mobile news meta-apps from January 1st 2020. To maximise representativeness of the sample there were no inclusion/exclusion criteria. Analyses here exclude data from participants under 16 years old, as they completed a briefer questionnaire, and those who responded to the questionnaire unfeasibly fast (<4 minutes). Cognitive test data will be reported separately. The study was approved by the Imperial College Research Ethics Committee (17IC4009). Data collection Data were collected via our custom server system, which produces study-specific websites (https://gbws.cognitron.co.uk) on the Amazon EC2. Questionnaires and tests were programmed in Javascript and HTML5. They were deliverable via personal computers, tablets and smartphones. The questionnaire included scales quantifying sociodemographic, lifestyle, online technology use, personality, and mental health (Supplement 1). Participants could enrol for longitudinal follow up, scheduled for 3, 6 and 12 months. People returning to the site outside of these timepoints were navigated to a different URL. On May 2nd 2020, the questionnaire was augmented - in light of the Covid-19 pandemic - with an extended mood scale, and an instrument comprising 47 items quantifying self-perceived effects on mood, behaviour and outlook (Pandemic General Impact Scale PD-GIS-11). Questions regarding pre-existing psychiatric and neurological conditions, lockdown context, having the virus, and free text fields were added. This coincided with further promotion via BBC2 Horizon and BBC Homepage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set represents a point-in-time full JavaScript-enabled scrape of all available 115th U.S. Congress member web sites. The data collection originated and completed on 2018-04-13 and the results are in ndjson/jsonlines/streaming JSON format. File format information is in the enclosed README.md file.
The data was used to evaluate the privacy profiles of each U.S. Congress members' official (.gov hosted) websites for the discussion in <https://rud.is/b/2018/04/13/does-congress-really-care-about-your-privacy/>.
ScrapingHub's "Splash" platform (<https://github.com/scrapinghub/splash>) was used along with the "splashr" R package (<https://github.com/hrbrmstr/splashr>) to retrieve the content.
It is what it says on the tin, 2k websites that don't run javascript (aka: probably really old, and simple!). Filtering needs to be done to take out the defunct sites and error messages. I don't expect this is a hard task.
PredictLeads Global Technographic Dataset delivers in-depth insights into technology adoption across millions of companies worldwide. Our dataset, sourced from HTML, JavaScript, and job postings, enables B2B sales, marketing, and data enrichment teams to refine targeting, enhance lead scoring, and optimize outreach strategies. By tracking 25,000+ technologies across 92M+ websites, businesses can uncover market trends, assess competitor technology stacks, and personalize their approach.
Use Cases:
β Enhance CRM Data β Enrich company records with detailed real-time technology insights. β Targeted Sales Outreach β Identify prospects based on their tech stack and personalize outreach. β Competitor & Market Analysis β Gain insights into competitor technology adoption and industry trends. β Lead Scoring & Prioritization β Rank potential customers based on adopted technologies. β Personalized Marketing β Craft highly relevant campaigns based on technology adoption trends.
API Attributes & Structure:
π PredictLeads Technographic Data is trusted by enterprises and B2B professionals for accurate, real-time technology intelligence, enabling smarter prospecting, data-driven marketing, and competitive analysis
PredictLeads Technology Detections Dataset https://docs.predictleads.com/v3/guide/technology_detections_dataset
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This Widget provides access to all FSA Daily Terminal Market Prices information releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.
By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.