11 datasets found

Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Dataset Collected by JSObserver
zenodo.org
zip
Updated Jun 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng (2020). Dataset Collected by JSObserver [Dataset]. http://doi.org/10.5281/zenodo.3874944
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3874944
Dataset updated
Jun 4, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a sampled dataset collected by JSObserver on Alexa top 100K websites. We analyze the log files to identify JavaScript global identifier conflicts, i.e., variable value conflicts, variable type conflicts and function definition conflicts.

We release the log files on websites where we detect the above conflicts, and split the whole dataset into 10 subsets, i.e., 1-50K-0.zip ~ 50K-100K-4.zip.

The writes to a memory location in JavaScript are saved in [rank].[main/sub].[frame_cnt].asg (e.g., 1.main.0.asg) files.

JavaScript global function definitions are saved in [rank].[main/sub].[frame_cnt].func (e.g., 1.main.0.func) files.

The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.

The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.

We also sample 100 websites on which we did not detect any conflicts. The log files collected on those websites are available in sampled_no_conflict.zip
Z
The Klarna Product-Page Dataset
data.niaid.nih.gov
researchdata.se
+2more
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Risuleo, Riccardo Sven (2024). The Klarna Product-Page Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12605479
Explore at:
Dataset updated
Nov 7, 2024
Dataset provided by
Moradi, Aref
Magureanu, Stefan
Hotti, Alexandra
Lagergren, Jens
Risuleo, Riccardo Sven
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description

The Klarna Product Page Dataset is a dataset of publicly available pages corresponding to products sold online on various e-commerce websites. The dataset contains offline snapshots of 51,701 product pages collected from 8,175 distinct merchants across 8 different markets (US, GB, SE, NL, FI, NO, DE, AT) between 2018 and 2019. On each page, analysts labelled 5 elements of interest: the price of the product, its image, its name and the add-to-cart and go-to-cart buttons (if found). These labels are present in the HTML code as an attribute called klarna-ai-label taking one of the values: Price, Name, Main picture, Add to cart and Cart.

The snapshots are available in 3 formats: as MHTML files (~24GB), as WebTraversalLibrary (WTL) snapshots (~7.4GB), and as screeshots (~8.9GB). The MHTML format is less lossy, a browser can render these pages though any Javascript on the page is lost. The WTL snapshots are produced by loading the MHTML pages into a chromium-based browser. To keep the WTL dataset compact, the screenshots of the rendered MTHML are provided separately; here we provide the HTML of the rendered DOM tree and additional page and element metadata with rendering information (bounding boxes of elements, font sizes etc.). The folder structure of the screenshot dataset is identical to the one the WTL dataset and can be used to complete the WTL snapshots with image information. For convenience, the datasets are provided with a train/test split in which no merchants in the test set are present in the training set.

Corresponding Publication

For more information about the contents of the datasets (statistics etc.) please refer to the following TMLR paper.

GitHub Repository

The code needed to re-run the experiments in the publication accompanying the dataset can be accessed here.

Citing

If you found this dataset useful in your research, please cite the paper as follows:

@article{hotti2024the, title={The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models}, author={Alexandra Hotti and Riccardo Sven Risuleo and Stefan Magureanu and Aref Moradi and Jens Lagergren}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=zz6FesdDbB}, note={} }
Farm Service Agency News and Events Widget
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News and Events Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-and-events-widget
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Farm Service Agencyhttps://www.fsa.usda.gov/
United States Department of Agriculturehttp://usda.gov/
Description
This Widget provides access to all FSA National News releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
Z
NBP 2202 data collection map
data.niaid.nih.gov
zenodo.org
Updated Mar 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rollo, Callum (2022). NBP 2202 data collection map [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6383011
Explore at:
Dataset updated
Mar 25, 2022
Dataset authored and provided by
Rollo, Callum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full code and dataset for the NBP 2202 map website. Data were collected during Jan-Feb 2022 in the Amundsen sea from the Nathaniel B. Palmer. This is a Python-flask app which displays data in a javascript leaflet map. The contents of this dataset should be all you need to host the website yourself, for local viewing or to make publicly available

This upload is a copy of the GitHub repo taken on 24/03/22 with additional satellite data that was too large for git.

The github repo can be found here https://github.com/callumrollo/itgc-2022-map/

The website is currently maintained at https://nbp2202map.com/

All data are publicly available. Locations and information displayed in the map are for convenience purposes only and are not authoritative. Contact the PIS of the International Thwaites Glacier Collaboration (ITGC) for full datasets. This website is the author's personal work and does not reflect the views of the ITGC group. The author has no official affiliation with ITGC.
e
COVID-19 Impact Dataset: Great British Intelligence Test, 2020 - Dataset -...
b2find.eudat.eu
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). COVID-19 Impact Dataset: Great British Intelligence Test, 2020 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/85806aa2-53a7-5728-8883-6be7d8f3496e
Explore at:
Dataset updated
Oct 20, 2023
Description
There is an urgent need to understand the factors that mediate and mitigate the impact of the Covid-19 pandemic on behaviour and wellbeing. However, the onset of the outbreak was unexpected and the rate of acceleration so rapid as to preclude the planning of studies that can address these critical issues. Coincidentally, in January 2020, just prior to the outbreak in the UK, my team launched a study that collected detailed (~50 minute) cognitive and questionnaire assessments from >200,000 members of the UK public as part of a collaboration with the BBC. This placed us in a unique position to examine how aspects of mental health subsequently changed as the pandemic arrived in the UK. Therefore, we collected data from a further ~120,000 people in May, including additional detailed measures of self-perceived pandemic impact and free text descriptions of the main positives, negatives and pragmatic measures that people found helped them maintain their wellbeing. In this data archive, we include the survey data from January and May 2020 examining impact of Covid-19 on mood, wellbeing and behaviour in the UK population. This data is reported in a preprint article, where we apply a novel fusion of psychometric, multivariate and machine learning analyses to this unique dataset, in order to address some of the most pressing questions regarding wellbeing during the pandemic in a data-driven manner. The preprint is available on this URL. https://www.medrxiv.org/content/10.1101/2020.06.18.20134635v1 Recruitment Starting from December 26th 2019, participants were recruited to the study website, where they completed cognitive tests and a detailed questionnaire. Articles describing the study were placed on the BBC2 Horizon, BBC Home page, BBC News Home page and circulated on mobile news meta-apps from January 1st 2020. To maximise representativeness of the sample there were no inclusion/exclusion criteria. Analyses here exclude data from participants under 16 years old, as they completed a briefer questionnaire, and those who responded to the questionnaire unfeasibly fast (<4 minutes). Cognitive test data will be reported separately. The study was approved by the Imperial College Research Ethics Committee (17IC4009). Data collection Data were collected via our custom server system, which produces study-specific websites (https://gbws.cognitron.co.uk) on the Amazon EC2. Questionnaires and tests were programmed in Javascript and HTML5. They were deliverable via personal computers, tablets and smartphones. The questionnaire included scales quantifying sociodemographic, lifestyle, online technology use, personality, and mental health (Supplement 1). Participants could enrol for longitudinal follow up, scheduled for 3, 6 and 12 months. People returning to the site outside of these timepoints were navigated to a different URL. On May 2nd 2020, the questionnaire was augmented - in light of the Covid-19 pandemic - with an extended mood scale, and an instrument comprising 47 items quantifying self-perceived effects on mood, behaviour and outlook (Pandemic General Impact Scale PD-GIS-11). Questions regarding pre-existing psychiatric and neurological conditions, lockdown context, having the virus, and free text fields were added. This coincided with further promotion via BBC2 Horizon and BBC Homepage.
115th U.S. Congress Member Website (Full JavaScript-enabled Scrape)...
zenodo.org
application/gzip, bin +1
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bob Rudis; Bob Rudis (2020). 115th U.S. Congress Member Website (Full JavaScript-enabled Scrape) Collection [Dataset]. http://doi.org/10.5281/zenodo.1219056
Explore at:
txt, application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1219056
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bob Rudis; Bob Rudis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This data set represents a point-in-time full JavaScript-enabled scrape of all available 115th U.S. Congress member web sites. The data collection originated and completed on 2018-04-13 and the results are in ndjson/jsonlines/streaming JSON format. File format information is in the enclosed README.md file.

The data was used to evaluate the privacy profiles of each U.S. Congress members' official (.gov hosted) websites for the discussion in <https://rud.is/b/2018/04/13/does-congress-really-care-about-your-privacy/>.

ScrapingHub's "Splash" platform (<https://github.com/scrapinghub/splash>) was used along with the "splashr" R package (<https://github.com/hrbrmstr/splashr>) to retrieve the content.
h
js-free-sites
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maxine, js-free-sites [Dataset]. https://huggingface.co/datasets/crumb/js-free-sites
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
maxine
Description
It is what it says on the tin, 2k websites that don't run javascript (aka: probably really old, and simple!). Filtering needs to be done to take out the defunct sites and error messages. I don't expect this is a hard task.
d
B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts &...
datarade.ai
.json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads, B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts & Jobs | 921M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-b2b-data-technographic-data-api-flat-file-predictleads
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
PredictLeads
Area covered
Ascension and Tristan da Cunha, Uzbekistan, Saudi Arabia, Cook Islands, Mongolia, Nauru, Hong Kong, Costa Rica, Lithuania, Vanuatu
Description
PredictLeads Global Technographic Dataset delivers in-depth insights into technology adoption across millions of companies worldwide. Our dataset, sourced from HTML, JavaScript, and job postings, enables B2B sales, marketing, and data enrichment teams to refine targeting, enhance lead scoring, and optimize outreach strategies. By tracking 25,000+ technologies across 92M+ websites, businesses can uncover market trends, assess competitor technology stacks, and personalize their approach.

Use Cases:

✅ Enhance CRM Data – Enrich company records with detailed real-time technology insights. ✅ Targeted Sales Outreach – Identify prospects based on their tech stack and personalize outreach. ✅ Competitor & Market Analysis – Gain insights into competitor technology adoption and industry trends. ✅ Lead Scoring & Prioritization – Rank potential customers based on adopted technologies. ✅ Personalized Marketing – Craft highly relevant campaigns based on technology adoption trends.

API Attributes & Structure:

id (string, UUID) – Unique identifier for each technology detection.

first_seen_at (ISO 8601 date-time) – Timestamp when the technology was first detected on the company's website.

last_seen_at (ISO 8601 date-time) – Most recent timestamp when the technology was last observed.

behind_firewall (boolean) – Indicates whether the technology is protected behind a firewall.

score (float, 0–1) – Confidence score for the detection accuracy.

company (object) – The company using the detected technology, including:

- id (UUID) – Unique company identifier.

- domain (string) – Company website domain.

- company_name (string) – Company's official name.

- ticker (string, nullable) – Stock ticker (if publicly listed).

technology (object) – Information on the detected technology, including:

- id (UUID) – Unique technology identifier.

- name (string) – Technology name (e.g., Salesforce, HubSpot, AWS).

seen_on_job_openings (boolean) – True/False flag indicating if the technology is - mentioned in job postings.

seen_on_subpages (array of objects) – List of company subpages where the technology was detected.

📌 PredictLeads Technographic Data is trusted by enterprises and B2B professionals for accurate, real-time technology intelligence, enabling smarter prospecting, data-driven marketing, and competitive analysis

PredictLeads Technology Detections Dataset https://docs.predictleads.com/v3/guide/technology_detections_dataset
w
Farm Service Agency Market News Widget
data.wu.ac.at
agdatacommons.nal.usda.gov
+2more
html
Updated Feb 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Agriculture (2014). Farm Service Agency Market News Widget [Dataset]. https://data.wu.ac.at/schema/data_gov/Njg3ZmZmOGYtNDY4OS00YTM5LWFkM2MtMzFlZDU4NjVjZTM0
Explore at:
htmlAvailable download formats
Dataset updated
Feb 27, 2014
Dataset provided by
Department of Agriculture
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
1b9ba8d7a8e390301d6c89c9aa1433cbf13dc03a
Description
This Widget provides access to all FSA Daily Terminal Market Prices information releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
R
Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...
data.ru.nl
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunes Acar; Senol, A. (2023). Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study (WPES'23) [Dataset]. http://doi.org/10.34973/86ks-gf89
Explore at:
(21570613845 bytes)Available download formats
Unique identifier
https://doi.org/10.34973/86ks-gf89
Dataset updated
Nov 27, 2023
Dataset provided by
Radboud University
Authors
Gunes Acar; Senol, A.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.

By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk

Website Statistics

Explore at:

csv, pdfAvailable download formats

Dataset updated

Jun 11, 2018

Dataset provided by

Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/

License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

Clear search

Close search

Google apps

Main menu

Website Statistics

Dataset Collected by JSObserver

The Klarna Product-Page Dataset

Farm Service Agency News and Events Widget

NBP 2202 data collection map

COVID-19 Impact Dataset: Great British Intelligence Test, 2020 - Dataset -...

115th U.S. Congress Member Website (Full JavaScript-enabled Scrape)...

js-free-sites

B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts &...

Farm Service Agency Market News Widget

Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...

Website Statistics