91 datasets found

Website Traffic
kaggle.com
zip
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic
Explore at:
zip(65228 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
AnthonyTherrien
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.

Session Duration: The total duration of the session in minutes.

Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.

Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).

Time on Page: The amount of time spent on the specific page.

Previous Visits: The number of previous visits by the same visitor.

Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000

Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.

Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.

Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.

Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.

Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.

Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.

Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.

Analyzing the effectiveness of different traffic sources.

Understanding user behavior patterns and optimizing the website accordingly.

Improving marketing strategies based on traffic source performance.

Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.
SimilarWeb Top Websites [April 2024]
kaggle.com
zip
Updated Sep 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Kamal Alsyd (2024). SimilarWeb Top Websites [April 2024] [Dataset]. https://www.kaggle.com/datasets/mohammedkamalalsyd/similarweb-top-websites
Explore at:
zip(480522 bytes)Available download formats
Dataset updated
Sep 21, 2024
Authors
Mohammed Kamal Alsyd
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides detailed insights into website traffic metrics and user engagement statistics, collected from SimilarWeb. The data includes information on various websites, such as rank, category, average visit duration, pages per visit, and bounce rate. This data aims to facilitate an understanding of online behavior and performance trends across different sectors, making it a valuable resource for researchers, marketers, and data analysts. The dataset is ideal for exploring patterns in web traffic and user interaction and conducting comparative analyses across various website categories.

Important Warning: Running this code within Kaggle may result in a ban, as scraping activities are prohibited on the platform. There is no guarantee that any ban will be lifted, as Kaggle staff may interpret scraping as a denial-of-service attack. Although I have implemented measures to reduce server load, such as adding sleep intervals, it is advisable to run this code locally to ensure compliance with Kaggle's policies.
Data from: Website Traffic Analysis
kaggle.com
zip
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhanupratap Biswas (2024). Website Traffic Analysis [Dataset]. https://www.kaggle.com/datasets/bhanupratapbiswas/website-traffic-analysis
Explore at:
zip(5409593 bytes)Available download formats
Dataset updated
Sep 1, 2024
Authors
Bhanupratap Biswas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Website Traffic Analysis

Website traffic analysis is the process of monitoring and evaluating the visitors to a website. It provides insights into how users are interacting with the site, where they are coming from, which pages they visit most often, and how long they stay. By analyzing this data, businesses can understand user behavior, improve site performance, and optimize content to increase engagement and conversions.

Key metrics include the number of visitors, page views, bounce rate, traffic sources (organic, referral, direct), and geographic location. Website traffic analysis is essential for enhancing SEO, refining marketing strategies, and boosting overall user experience.
Website Statistics
data.wu.ac.at
data.lincolnshire.gov.uk
+1more
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
website_visit_webalizer
kaggle.com
zip
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin ÇOBAN (2024). website_visit_webalizer [Dataset]. https://www.kaggle.com/datasets/erinoban/website-visit-webalizer
Explore at:
zip(1082 bytes)Available download formats
Dataset updated
Mar 24, 2024
Authors
Erin ÇOBAN
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset was obtained from website visit data. These are real data. It contains monthly visit information of the tr-metaverse.com website hosted on Linux. Day Hit Hit% Files Files% Pages Pages% Visit Visit% Sites Sites% Kbytes Kbytes% It consists of fields. Values with a % sign next to them are numbers in percent. 30-day visit data from the beginning of the month to the end of the month. Day: Day index number, which day of the month Hit: How much reach there is in general Hit%: How much access there is overall in percentage Files: How many visits have been made as files Files%: Percentage in files Pages Pages% Visit: Number of unique visitors Visit%: Unique visitor rate sites sites% Kbytes: how much data has been downloaded Kbytes%: percentage in data
g
Website Metrics
gimi9.com
catalog.data.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Website Metrics [Dataset]. https://gimi9.com/dataset/data-gov_website-metrics/
Explore at:
Dataset updated
Apr 1, 2025
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
A web tracking data set of online browsing behavior of 2,148 users
zenodo.org
data-staging.niaid.nih.gov
+1more
application/gzip, txt +1
Updated Oct 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2025). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
Explore at:
zip, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4757574
Dataset updated
Oct 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

The data set is analyzed in the following paper:

Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

If you use data or code from this repository, please cite the paper above and the Zenodo link.

Users are advised that some domains in this data set may link to potentially questionable or inappropriate content. The domains have not been individually reviewed, as content verification was not the primary objective of this data set. Therefore, user discretion is strongly recommended when accessing or scraping any content from these domains.
Coho Abundance - Linear Features [ds183]
data-cdfw.opendata.arcgis.com
data.ca.gov
+7more
Updated Oct 1, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2014). Coho Abundance - Linear Features [ds183] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::coho-abundance-linear-features-ds183
Explore at:
Dataset updated
Oct 1, 2014
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
Area covered

Description
The CalFish Abundance Database contains a comprehensive collection of anadromous fisheries abundance information. Beginning in 1998, the Pacific States Marine Fisheries Commission, the California Department of Fish and Game, and the National Marine Fisheries Service, began a cooperative project aimed at collecting, archiving, and entering into standardized electronic formats, the wealth of information generated by fisheries resource management agencies and tribes throughout California.Extensive data are currently available for chinook, coho, and steelhead. Major data categories include adult abundance population estimates, actual fish and/or carcass counts, counts of fish collected at dams, weirs, or traps, and redd counts. Harvest data has been compiled for many streams, and hatchery return data has been compiled for the states mitigation facilities. A draft format has been developed for juvenile abundance and awaits final approval. This CalFish Abundance Database shapefile was generated from fully routed 1:100,000 hydrography. In a few cases streams had to be added to the hydrography dataset in order to provide a means to create shapefiles to represent abundance data associated with them. Streams added were digitized at no more than 1:24,000 scale based on stream line images portrayed in 1:24,000 Digital Raster Graphics (DRG).These features generally represent abundance counts resulting from stream surveys. The linear features in this layer typically represent the location for which abundance data records apply. This would be the reach or length of stream surveyed, or the stream sections for which a given population estimate applies. In some cases the actual stream section surveyed was not specified and linear features represent the entire stream. In many cases there are multiple datasets associated with the same length of stream, and so, linear features overlap. Please view the associated datasets for detail regarding specific features. In CalFish these are accessed through the "link" that is visible when performing an identify or query operation. A URL string is provided with each feature in the downloadable data which can also be used to access the underlying datasets.The coho data that is available via the CalFish website is actually linked directly to the StreamNet website where the database's tabular data is currently stored. Additional information about StreamNet may be downloaded at http://www.streamnet.org. Complete documentation for the StreamNet database may be accessed at http://http://www.streamnet.org/def.html
Data & Analytics Stats LinkedIn Company Page
kaggle.com
zip
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Peters (2024). Data & Analytics Stats LinkedIn Company Page [Dataset]. https://www.kaggle.com/datasets/mirkopeters/data-and-analytics-stats-linkedin-company-page
Explore at:
zip(689754 bytes)Available download formats
Dataset updated
Aug 13, 2024
Authors
Mirko Peters
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
LinkedIn Company Page Data - The Data Analytics Academy Dataset Overview This dataset contains detailed insights from The Data Analytics Academy's LinkedIn Company Page, including information on content performance, followers, and visitors. The data is sourced directly from our LinkedIn analytics and has been organized into CSV files for ease of use.

Files Included: Content Data: Performance metrics for posts and updates shared on our LinkedIn page. Followers Data: Demographics and growth metrics of our LinkedIn page followers. Visitors Data: Insights on page visitors, including demographics and engagement levels. Use Cases: Social Media Analytics: Analyze the performance of content and its reach among different demographics. Market Research: Understand audience demographics and how they engage with our page. Data Science Projects: Apply machine learning algorithms to predict content performance or audience growth. Acknowledgments This data is free to use for any purpose, including commercial use. However, if you use this dataset, please give credit to The Data Analytics Academy by mentioning us or linking to our LinkedIn page: The Data Analytics Academy.

Inspiration This dataset can be used to explore various aspects of LinkedIn analytics, such as identifying trends in audience engagement, understanding content performance, and predicting follower growth.
c
ROADWork Data
kilthub.cmu.edu
bin
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ILIMLab Admin; Anurag Ghosh; Robert Tamburo; Srinivasa Narasimhan; Shen Zheng; Juan Alvarez Padilla; Michael Cardei; Nicholas Dunn; Hailiang Zhu (2024). ROADWork Data [Dataset]. http://doi.org/10.1184/R1/26093197.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/26093197.v2
Dataset updated
Jul 18, 2024
Dataset provided by
Carnegie Mellon University
Authors
ILIMLab Admin; Anurag Ghosh; Robert Tamburo; Srinivasa Narasimhan; Shen Zheng; Juan Alvarez Padilla; Michael Cardei; Nicholas Dunn; Hailiang Zhu
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
Brief Description of Dataset Files

images.zip -- contains all the ROADWork images that have been manually annotated sem_seg_labels.zip -- contains semantic segmentation labels for images in images.zip in the Cityscapes format. annotations.zip -- contains instance segmentations, sign information, scene descriptions and other labels for images in images.zip in a COCO-like format. It contains multiple splits, suited for different tasks. Please see Usage for more information. discovered_images.zip-- contains discovered images with roadwork scenes from BDD100K and Mapillary dataset (less than 1000 images in total). These images are provided for ease of access ONLY. See below for specific license information for these external datasets. traj_images.zip -- contains images associated with pathways. These images were manually filtered to contain ground truth pathways obtained from COLMAP. The split is described in Usage, to avoid data contamination from models trained on images.zip. traj_annotations.zip -- contains pathway annotations corresponding to images in traj_images.zip. traj_images_dense.zip -- contains the dense set of images with associated pathways. These are similar to traj_images.zip, they are not subsampled. traj_annotations_dense.zip -- contains pathway annotations corresponding to images in traj_images_dense.zip videos_compressed.zip -- contains video snippets from Robotics Open Dataset that we used to compute pathways using COLMAP. Repository contains all the data from ROADWork Dataset. Please visit our project webpage for more information on the dataset: www.cs.cmu.edu/~ILIM/roadwork_dataset/ Usage Please go to our Github repository: https://github.com/anuragxel/roadwork-dataset/

License ROADWork dataset images collected by us and all the annotations are licensed under the Open Data Commons Attribution License v1.0. All images from Roadbotics Dataset are provided for ease of access, and they are licensed under the Open Data Commons Attribution License v1.0. Any other data from other datasets (e.g. data in discovered_images.zip) is distributed with its own licenses and terms. License of Discovered Images A small sample of Mappilary Vistas dataset images (in mappilary/ subdirectory in discovered_images.zip) are provided for ease of use. These images are licensed under the CC BY-NC-SA license and Mappilary Terms of Use. You agree to these terms if you use these images in any form. Please visit the following link for more information about the Mappilary Vistas dataset: https://www.mapillary.com/dataset/vistas

A small sample of BDD100K dataset images (in bdd100k/ subdirectory in discovered_images.zip) are provided for ease of use. These images are licensed according to (https://doc.bdd100k.com/license.html) which allows us to distibute these images with attribution. You agree to their license agreement if you use these images in any form. Please visit the following link for more information about the BDD100K dataset: http://bdd-data.berkeley.edu/
d
HireDevice Trip - Dataset - data.govt.nz - discover and use data
catalogue.data.govt.nz
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HireDevice Trip - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/hiredevice-trip
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of trips taken on HireDevice in Hamilton City. To get data for this dataset, please call the API directly talking to the HCC Data Warehouse: https://api.hcc.govt.nz/OpenData/get_hiredevice_trip?Page=1&Start_Date=2020-10-01&End_Date=2020-10-02. For this API, there are three mandatory parameters: Page, Start_Date, End_Date. Sample values for these parameters are in the link above. When calling the API for the first time, please always start with Page 1. Then from the returned JSON, you can see more information such as the total page count and page size. For help on using the API in your preferred data analysis software, please contact dale.townsend@hcc.govt.nz. NOTE: Anomalies and missing data may be present in the dataset. Column_InfoTrip_Id, varchar : Unique identifier of the tripTrip_Duration, int : Duration of the trip in secondsTrip_Distance, int : Distance of the trip in metresDevice_Id, varchar : Unique identifier of the GPS device on the scooterVehicle_Id, varchar : Unique identifier of the scooterStart_Time, datetime : Date and time that the trip startedEnd_Time, datetime : Date and time that the trip ended Relationship This table is referenced by HireDevice_Route Analytics For convenience Hamilton City Council has also built a Quick Analytics Dashboard over this dataset that you can access here. Disclaimer Hamilton City Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data released for public download. Levels, locations and dimensions of works depicted in the data may not be accurate due to circumstances not notified to Council. A physical check should be made on all levels, locations and dimensions before starting design or works. Hamilton City Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data. While you are free to crop, export and re-purpose the data, we ask that you attribute the Hamilton City Council and clearly state that your work is a derivative and not the authoritative data source. Please include the following statement when distributing any work derived from this data: ‘This work is derived entirely or in part from Hamilton City Council data; the provided information may be updated at any time, and may at times be out of date, inaccurate, and/or incomplete.'
NYC STEW-MAP Staten Island organizations' website hyperlink webscrape
catalog.data.gov
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). NYC STEW-MAP Staten Island organizations' website hyperlink webscrape [Dataset]. https://catalog.data.gov/dataset/nyc-stew-map-staten-island-organizations-website-hyperlink-webscrape
Explore at:
Dataset updated
Nov 21, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
New York, Staten Island
Description
The data represent web-scraping of hyperlinks from a selection of environmental stewardship organizations that were identified in the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017). There are two data sets: 1) the original scrape containing all hyperlinks within the websites and associated attribute values (see "README" file); 2) a cleaned and reduced dataset formatted for network analysis. For dataset 1: Organizations were selected from from the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017), a publicly available, spatial data set about environmental stewardship organizations working in New York City, USA (N = 719). To create a smaller and more manageable sample to analyze, all organizations that intersected (i.e., worked entirely within or overlapped) the NYC borough of Staten Island were selected for a geographically bounded sample. Only organizations with working websites and that the web scraper could access were retained for the study (n = 78). The websites were scraped between 09 and 17 June 2020 to a maximum search depth of ten using the snaWeb package (version 1.0.1, Stockton 2020) in the R computational language environment (R Core Team 2020). For dataset 2: The complete scrape results were cleaned, reduced, and formatted as a standard edge-array (node1, node2, edge attribute) for network analysis. See "READ ME" file for further details. References: R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Version 4.0.3. Stockton, T. (2020). snaWeb Package: An R package for finding and building social networks for a website, version 1.0.1. USDA Forest Service. (2017). Stewardship Mapping and Assessment Project (STEW-MAP). New York City Data Set. Available online at https://www.nrs.fs.fed.us/STEW-MAP/data/. This dataset is associated with the following publication: Sayles, J., R. Furey, and M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7: 36, (2022).
d
Traffic Link Stats - Dataset - data.govt.nz - discover and use data
catalogue.data.govt.nz
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Traffic Link Stats - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/traffic-link-stats
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vehicle travel time and delay data on sections of road in Hamilton City, based on Bluetooth sensor records. To get data for this dataset, please call the API directly talking to the HCC Data Warehouse: https://api.hcc.govt.nz/OpenData/get_traffic_link_stats?Page=1&Start_Date=2021-06-02&End_Date=2021-06-03. For this API, there are three mandatory parameters: Page, Start_Date, End_Date. Sample values for these parameters are in the link above. When calling the API for the first time, please always start with Page 1. Then from the returned JSON, you can see more information such as the total page count and page size. For help on using the API in your preferred data analysis software, please contact dale.townsend@hcc.govt.nz. NOTE: Anomalies and missing data may be present in the dataset. Column_InfoLink_Id, int : Unique link identifierTravel_Time, int : Average travel time in seconds to travel along the linkAverage_Delay, int : Average travel delay in seconds, calculated as the difference between the free flow travel time and observed travel timeDate, varchar : Starting date and time for the recorded delay and travel time, in 15 minute periods Relationship This table reference to table Traffic_Link Analytics For convenience Hamilton City Council has also built a Quick Analytics Dashboard over this dataset that you can access here. Disclaimer Hamilton City Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data released for public download. Levels, locations and dimensions of works depicted in the data may not be accurate due to circumstances not notified to Council. A physical check should be made on all levels, locations and dimensions before starting design or works. Hamilton City Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data. While you are free to crop, export and re-purpose the data, we ask that you attribute the Hamilton City Council and clearly state that your work is a derivative and not the authoritative data source. Please include the following statement when distributing any work derived from this data: ‘This work is derived entirely or in part from Hamilton City Council data; the provided information may be updated at any time, and may at times be out of date, inaccurate, and/or incomplete.'
c
Home Depot products dataset
crawlfeeds.com
csv, zip
Updated Mar 5, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2026). Home Depot products dataset [Dataset]. https://crawlfeeds.com/datasets/home-depot-products-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Mar 5, 2026
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock valuable insights with our comprehensive Home Depot product dataset. This dataset is meticulously curated, offering detailed information on a wide range of products available at Home Depot.

Homedepot available datasets:

We offer a wide range of categories, including furniture, home décor, painting, plumbing, and many more. Explore all available options here.

Whether you're conducting market research, enhancing your e-commerce platform, or analyzing retail trends, this dataset is an invaluable resource. It includes product names, descriptions, prices, categories, and more. Optimize your projects with high-quality, structured data from one of the largest home improvement retailers in the world.

Stay ahead in the competitive market with accurate and up-to-date product information.

Home Depot products latest dataset having around 2 million records. Get in touch with crawl feeds to require any updates in dataset.

For a closer look at the product-level data we’ve extracted from Home Depot, including pricing, stock status, and detailed specifications, visit the Home Depot dataset page. You can explore sample records and submit a request for tailored extracts directly from there.
y
Variable Message Signs - Dataset - York Open Data
data.yorkopendata.org
ckan.york.staging.datopian.com
Updated May 9, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Variable Message Signs - Dataset - York Open Data [Dataset]. https://data.yorkopendata.org/dataset/variable-message-signs
Explore at:
Dataset updated
May 9, 2017
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
York
Description
Variable Message Signs (VMS) in York. For further information about traffic management please visit the City of York Council website. *Please note that the data published within this dataset is a live API link to CYC's GIS server. Any changes made to the master copy of the data will be immediately reflected in the resources of this dataset.The date shown in the "Last Updated" field of each GIS resource reflects when the data was first published.
o
PhishingWebsites
openml.org
Updated Feb 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae) (2016). PhishingWebsites [Dataset]. https://www.openml.org/d/4534
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2016
Authors
Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae)
Description
Author: Rami Mustafa A Mohammad ( University of Huddersfield","rami.mohammad '@' hud.ac.uk","rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield","t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai","fadi '@' cud.ac.ae)
Source: UCI
Please cite: Please refer to the Machine Learning Repository's citation policy

Source:

Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai,fadi '@' cud.ac.ae)

Data Set Information:

One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

Attribute Information:

For Further information about the features see the features file in the data folder of UCI.

Relevant Papers:

Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi (2012) An Assessment of Features Related to Phishing Websites using an Automated Technique. In: International Conferece For Internet Technology And Secured Transactions. ICITST 2012 . IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0

Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2014) Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25 (2). pp. 443-458. ISSN 0941-0643

Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi Abdeljaber (2014) Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3). pp. 153-160. ISSN 1751-8709

Citation Request:

Please refer to the Machine Learning Repository's citation policy
CURVAS dataset
zenodo.org
zip
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meritxell Riera-Marín; Meritxell Riera-Marín; Joy-Marie Kleiß; Anton Aubanell; Anton Aubanell; Andreu Antolín; Andreu Antolín; Joy-Marie Kleiß (2024). CURVAS dataset [Dataset]. http://doi.org/10.5281/zenodo.12687192
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12687192
Dataset updated
Jul 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Meritxell Riera-Marín; Meritxell Riera-Marín; Joy-Marie Kleiß; Anton Aubanell; Anton Aubanell; Andreu Antolín; Andreu Antolín; Joy-Marie Kleiß
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinical Problem

In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

Furthermore, achieving model calibration, a fundamental aspect of reliable predictions, becomes notably challenging when dealing with multiple classes and raters. Calibration is pivotal for ensuring that predicted probabilities align with the true likelihood of events, enhancing the model's reliability. It must be considered that, even if not clearly, having multiple classes account for uncertainties arising from their interactions. Moreover, incorporating annotations from multiple raters adds another layer of complexity, as differing expert opinions may contribute to a broader spectrum of variability and computational complexity.

Consequently, the development of robust algorithms capable of effectively capturing and quantifying variability and uncertainty, while also accommodating the nuances of multi-class and multi-rater scenarios, becomes imperative. Striking a balance between model calibration, accurate segmentation and handling variability in medical annotations is crucial for the success and reliability of DL-based medical image analysis.

CURVAS Challenge Goal

Due to all the previously stated reasons, we have created a challenge that considers all of the above. In this challenge, we will work with abdominal CT scans. Each of them will have three different annotations obtained from different experts and each of the annotations will have three classes: pancreas, kidney and liver.

The main idea is to be able to evaluate the results considering the multi rater information. There will be three separate evaluations: firstly, a classical dice score evaluation together with an uncertainty study will be performed; secondly, a volumetric assessment to give relevant clinical information will take place; finally, a study on whether the model is calibrated or not will take place. All of these evaluations will be performed considering all three different annotations.

For more information about the challenge, visit our website to join CURVAS (Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation). This challenge will be held in MICCAI 2024.

Dataset Cohort

The challenge cohort consists of 90 CT images prospectively gathered at the University Hospital Erlangen between August 2023 and October 2023. Each CT will have multiple classes: background (0), pancreas (1), kidney (2) and liver (3). In addition, each of the CTs will have three different annotators from three different experts that will contain the four classes specified previously.

Training Phase cohort:

20 CT scans belonging to group A with the respective annotations will be given. It is encouraged to leverage publicly available external data annotated by multiple raters. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the option to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

Validation Phase cohort:

5 CT scans belonging to group A will be used for this phase.

Test Phase cohort:

65 CT scans will be used for evaluation. 20 CTs belonging to group A, 22 CTs belonging to group B and 23 CTs belonging to group C.

Both validation and testing CT scans cohorts will not be published until the end of the challenge. Furthermore, to which group each CT scan belongs will not be revealed until after the challenge.

Clinical Specifications

Inclusion criteria were a maximum of 10 cysts with a diameter of less than 2,0 cm. Furthermore, CT scans with major artifacts (e.g. breathing artifacts) or incomplete registrations were excluded.

Participants were required to be over 18 years old and provide both verbal and written consent for the use of their CT images in the Challenge. Both study-specific and broad consent were obtained. Among the 90 patients, there were 51 males and 39 females, aged between 37 and 94 years, with an average age of 65.7 years. All patients received treatment at the University Hospital Erlangen in Bavaria, Germany. No additional selection criteria was set to ensure a representative sample of a typical patient cohort.

Our overall data consists on 90 CTs splitted in three different groups:

Group A: cases with 2 cysts or less with no contour altering pathologies - 45 CTs

Group B: cases with 3-5 cysts with no contour altering pathologies - 22 CTs

Group C: cases with 6-10 cysts with some pathologies included (liver metastases, hydronephrosis, adrenal gland metastases, missing kidney) - 23 CTs

However, in any case, the participants will not know which case belongs to which group. This information will be released after the challenge, together with the whole dataset.

Annotation Protocol

The first step for obtaining de labels was using the TotalSegmentator [1] [2] to get rough annotations. Then, the labels were sent to three radiologists (R1, R2, R3), to both correct the automatic annotations and add possible missing organs. One of the three labeling radiologists, the MD PhD candidate, previously defined both the dataset cohort and the criteria of what belongs to the parenchyma and what does not and it was given to the other two labeling radiologists to follow the same criteria to be coherent with each other [3]. Separately, two other clinicians (C1, C2) supervised the criteria of the cohort defined by the MD PhD candidate, but not having any relation with the labeling itself, hence, there is no bias between the annotations of the different radiologists.

Each labeled class for this challenge has specific instructions. Below are listed per organ.

Liver:
Generally speaking, we define the liver 'as the entire liver tissue including all internal structures like vessel systems, tumors etc.' [4] Thus, the portal vein itself is excluded from contouring. The two main branches of the portal vein are excluded from the segmentation. Any branch of the following generations is included. 'In case of partial enclosure (occurring where large vessels as Vena Cava and portal vein enter or leave the liver), the parts enclosed by liver tissue are included in the segmentation, thus forming the convex hull of the liver shape.' [4] Any fatty tissue that pulls into the liver is excluded. The gallbladder should not be marked. Wide and especially pathologically widened bile ducts are included in the segmentation of the liver.

Kidney:
The right and left kidney will be segmented. Included in the segmentation will be the kidney parenchyma including the renal medulla. Excluded is the renal pelvis [5] and the ureter as a urinary stasis could alter the original volume.

Pancreas:
When segmenting the pancreas, we will not differentiate between head, body and tail. Moreover neither the splenic vein nor the mesenterial vein will be included in segmentation [6]. However, it is important the whole pancreas in its course is tracked and marked.

Technical Specifications

The CTs used needed to be contrast-enhanced CT scans in a portal venous phase with the acquisition of thin slices ranging from 0.6 to 1mm. Thoracic-Abdominal CT images were taken during the patients' hospital stay, motivated by various medical needs. Given the focus on abdominal organs, the Br40 soft kernel was employed. CT examinations were conducted using SIEMENS CT scanners at the university hospital Erlangen, with rotation speeds of 0.25 or 0.5 sec. Detector collimation varied from 128x0.6mm single source to 98x0.6x2 and 144x0.4x2 dual source configurations. Spiral pitch factors ranged from 0.3 to 1.3. The mean reference tube current was set at 200 mAs, adjustable to 120 mAs. Automated tube voltage adaptation and tube current modulation were implemented in all instances. Contrast agent administration was standard practice, with an injection rate of 3-4 mL/s and a body weight-adjusted dosage of 400 mg(iodine)/kg (equivalent to 1.14 ml/kg Iomeprol 350mg/ml). All images underwent reconstruction using soft convolution kernels and iterative techniques.

Ethical Approval and Data Usage
c
Toxic Release Inventory 2018
gis.data.ca.gov
data.ca.gov
+2more
Updated Sep 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DTSC_Admin (2019). Toxic Release Inventory 2018 [Dataset]. https://gis.data.ca.gov/datasets/DTSC::toxic-release-inventory-2018
Explore at:
Dataset updated
Sep 26, 2019
Dataset authored and provided by
DTSC_Admin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
The 2018 TRI preliminary dataset consists of TRI data for 2018. Users should note that while these preliminary data have undergone the basic data quality checks included in the online TRI reporting software, they have not undergone the complete TRI data quality process. In addition, EPA does not aggregate or summarize these data, or offer any analysis or interpretation of them.You can use the TRI preliminary dataset to: Identify how many TRI facilities operate in a certain geographic area (for example, a ZIP code);Identify which chemicals are being managed by TRI facilities and in what quantities; andFind out if a particular facility initiated any pollution prevention activities in the most recent calendar year.The agency will update the dataset several times in August and September based on information from facilities. EPA plans to publish the complete, quality-checked 2018 dataset in October 2019, followed by the 2018 TRI National Analysis in January 2020.
o
Amazon_employee_access
openml.org
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
See original data source. (2025). Amazon_employee_access [Dataset]. https://www.openml.org/d/46905
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2025
Authors
See original data source.
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset was curated for TabArena by the TabArena team as part of the TabArena Tabular ML IID Study. For more details on the study, see our paper.

Dataset Focus: This dataset shall be used for evaluating predictive machine learning models for independent and identically distributed tabular data. The intended task is classification.

Dataset Metadata

Licence: Public Domain

Original Data Source: https://www.kaggle.com/c/amazon-employee-access-challenge

Reference (please cite): Ben Hamner, kenmonta, and Will Cukierski. Amazon.com - Employee Access Challenge. https://kaggle.com/competitions/amazon-employee-access-challenge, 2013. Kaggle.

Dataset Year: 2010

Dataset Description: see the reference and the original data source for details.

Curation comments by the TabArena team (for code see the page of the study):

We only use the training data from Kaggle.

Anomaly: the data might contain sub-groups related to managers and resources.

Anomaly: likely, similar to the test data, each sample represents a unique employee.
Coho Abundance - Point Features [ds182]
data-cdfw.opendata.arcgis.com
data.ca.gov
+6more
Updated Oct 1, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2014). Coho Abundance - Point Features [ds182] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::coho-abundance-point-features-ds182
Explore at:
Dataset updated
Oct 1, 2014
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
Area covered

Description
The CalFish Abundance Database contains a comprehensive collection of anadromous fisheries abundance information. Beginning in 1998, the Pacific States Marine Fisheries Commission, the California Department of Fish and Game, and the National Marine Fisheries Service, began a cooperative project aimed at collecting, archiving, and entering into standardized electronic formats, the wealth of information generated by fisheries resource management agencies and tribes throughout California.Extensive data are currently available for chinook, coho, and steelhead. Major data categories include adult abundance population estimates, actual fish and/or carcass counts, counts of fish collected at dams, weirs, or traps, and redd counts. Harvest data has also been compiled for many streams.This CalFish Abundance Database shapefile was generated from fully routed 1:100,000 hydrography. In a few cases streams had to be added to the hydrography dataset in order to provide a means to create shapefiles to represent abundance data associated with them. Streams added were digitized at no more than 1:24,000 scale based on stream line images portrayed in 1:24,000 Digital Raster Graphics (DRG).These features represent abundance information resulting from counts at weirs, fish ladders, or other point-type monitoring protocols such as beach seining. The point features in this layer typically represent the location for which abundance data records apply. In many cases there are multiple datasets associated with the same point location, and so, point features overlap. Please view the associated datasets for detail regarding specific features. In CalFish these are accessed through the "link" field that is visible when performing an identify or query operation. A URL string is provided with each feature in the downloadable data which can also be used to access the underlying datasets.The coho data that is available via the CalFish website is actually linked directly to the StreamNet website where the database's tabular data is currently stored. Additional information about StreamNet may be downloaded at http://www.streamnet.org. Complete documentation for the StreamNet database may be accessed at http://http://www.streamnet.org/def.html

Facebook

Twitter

Click to copy link

Link copied

Cite

AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic

Website Traffic

Website Traffic and User Engagement Metrics

Explore at:

zip(65228 bytes)Available download formats

Dataset updated

Aug 5, 2024

Authors

AnthonyTherrien

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.
Session Duration: The total duration of the session in minutes.
Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.
Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).
Time on Page: The amount of time spent on the specific page.
Previous Visits: The number of previous visits by the same visitor.
Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000
Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.
Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.
Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.
Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.
Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.
Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.
Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.
Analyzing the effectiveness of different traffic sources.
Understanding user behavior patterns and optimizing the website accordingly.
Improving marketing strategies based on traffic source performance.
Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.

Clear search

Close search

Google apps

Main menu

Website Traffic

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments

SimilarWeb Top Websites [April 2024]

Data from: Website Traffic Analysis

Website Statistics

website_visit_webalizer

Website Metrics

A web tracking data set of online browsing behavior of 2,148 users

Coho Abundance - Linear Features [ds183]

Data & Analytics Stats LinkedIn Company Page

ROADWork Data

HireDevice Trip - Dataset - data.govt.nz - discover and use data

NYC STEW-MAP Staten Island organizations' website hyperlink webscrape

Traffic Link Stats - Dataset - data.govt.nz - discover and use data

Home Depot products dataset

Variable Message Signs - Dataset - York Open Data

PhishingWebsites

CURVAS dataset

Clinical Problem

CURVAS Challenge Goal

Dataset Cohort

Clinical Specifications

Annotation Protocol

Technical Specifications

Toxic Release Inventory 2018

Amazon_employee_access

Dataset Metadata

Curation comments by the TabArena team (for code see the page of the study):

Coho Abundance - Point Features [ds182]

Website Traffic

Website Traffic and User Engagement Metrics

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments