Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union".
Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content?
To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic.
In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained.
To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market.
It includes:
Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.
The original dataset contains missing values. They have been simply replaced by zeros.
Top 25 Daily Page Views for the main website of Los Angeles
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Digital technology and Internet use, website traffic strategies, by North American Industry Classification System (NAICS) and size of enterprise for Canada from 2012 to 2013.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name | Detection problem | Citation of original raw dataset |
botnet_binary.csv | Binary detection of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
botnet_multiclass.csv | Multi-class classification of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
cryptomining_design.csv | Binary detection of cryptomining; the design part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
cryptomining_evaluation.csv | Binary detection of cryptomining; the evaluation part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
dns_malware.csv | Binary detection of malware DNS | Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021. |
doh_cic.csv | Binary detection of DoH |
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020 |
doh_real_world.csv | Binary detection of DoH | Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022 |
dos.csv | Binary detection of DoS | Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019. |
edge_iiot_binary.csv | Binary detection of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
edge_iiot_multiclass.csv | Multi-class classification of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
https_brute_force.csv | Binary detection of HTTPS Brute Force | Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020 |
ids_cic_binary.csv | Binary detection of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_cic_multiclass.csv | Multi-class classification of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_unsw_nb_15_binary.csv | Binary detection of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
ids_unsw_nb_15_multiclass.csv | Multi-class classification of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
iot_23.csv | Binary detection of IoT malware | Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23 |
ton_iot_binary.csv | Binary detection of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
ton_iot_multiclass.csv | Multi-class classification of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
tor_binary.csv | Binary detection of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
tor_multiclass.csv | Multi-class classification of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
vpn_iscx_binary.csv | Binary detection of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_iscx_multiclass.csv | Multi-class classification of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_vnat_binary.csv | Binary detection of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
vpn_vnat_multiclass.csv | Multi-class classification of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
A dataset comparing features, pricing, and ratings of the top 4 traffic bots in 2025: SparkTraffic (4.5/5), TrafficBot.co (2.5/5), Traffic-Bot.com (3.0/5), and EpicTrafficBot (3.0/5).
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights
Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.
Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.
Use Case: Analyze Year Over Year Growth Rate by Region
Problem A public investor wants to understand how a company’s year-over-year growth differs by region.
Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends
Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume
Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels
Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.
Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
The dataset contains traffic collected for 96 websites located in
https://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
Website visitor analytics, engagement metrics, and competitive intelligence across public companies
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the aggregated version of the daily dataset used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-05, after aggregating them into weekly.
The original dataset contains missing values. They have been simply replaced by zeros before aggregation.
https://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
Real-time website visitor analytics and digital engagement metrics across global markets for institutional investment research and competitive intelligence.
This dataset was created by Suruthi S
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SDCC Traffic Congestion Saturation Flow Data for January to June 2023. Traffic volumes, traffic saturation, and congestion data for sites across South Dublin County. Used by traffic management to control stage timings on junctions. It is recommended that this dataset is read in conjunction with the ‘Traffic Data Site Names SDCC’ dataset.A detailed description of each column heading can be referenced below;scn: Site Serial numberregion: A group of Nodes that are operated under SCOOT control at the same common cycle time. Normally these will be nodes between which co-ordination is desirable. Some of the nodes may be double cycling at half of the region cycle time.system: SCOOT STC UTC (UTC-MX)locn: Locationssite: Site numbersday: Days of the week Monday to Sunday. Abbreviations; MO,TU,WE,TH,FR,SA,SU.date: Reflects correct actual Date of when data was collected.start_time: NOTE - Please ignore the date displayed in this column. The actual data collection date is correctly displayed in the 'date' column. The date displayed here is the date of when report was run and extracted from the system, but correctly reflects start time of 15 minute intervals. end_time: End time of 15 minute intervals.flow: A representation of demand (flow) for each link built up over several minutes by the SCOOT model. SCOOT has two profiles:(1) Short – Raw data representing the actual values over the previous few minutes(2) Long – A smoothed average of values over a longer periodSCOOT will choose to use the appropriate profile depending on a number of factors.flow_pc: Same as above ref PC SCOOTcong: Congestion is directly measured from the detector. If the detector is placed beyond the normal end of queue in the street it is rarely covered by stationary traffic, except of course when congestion occurs. If any detector shows standing traffic for the whole of an interval this is recorded. The number of intervals of congestion in any cycle is also recorded.The percentage congestion is calculated from:No of congested intervals x 4 x 100 cycle time in seconds.This percentage of congestion is available to view and more importantly for the optimisers to take into account.cong_pc: Same as above ref PC SCOOTdsat: The ratio of the demand flow to the maximum possible discharge flow, i.e. it is the ratio of the demand to the discharge rate (Saturation Occupancy) multiplied by the duration of the effective green time. The Split optimiser will try to minimise the maximum degree of saturation on links approaching the node.
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly