Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
As of the last quarter of 2023, ***** percent of web traffic in the United States originated from mobile devices, down from ***** percent in the fourth quarter of 2022. In comparison, over half of web traffic worldwide was generated via mobile in the last examined period.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Digital technology and Internet use, website traffic strategies, by North American Industry Classification System (NAICS) and size of enterprise for Canada from 2012 to 2013.
In June 2025, DoorDash's website, doordash.com, had just under 72 million visitors globally, recording a bounce rate of approximately 34.2 percent. For comparison, web traffic figures of UberEats show lower monthly visits.
In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
Web traffic statistics for the top 2000 most visited pages on nyc.gov by month.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
https://techkv.com/privacy-policy/https://techkv.com/privacy-policy/
It’s not really surprising to know that most of the internet traffic comes from mobile devices. Yet, I wouldn’t have believed this 10 or 15 years back. Sure, mobile devices were becoming popular, but the adoption rates had a sudden jump in the past decade. A quick analysis of statistics...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Between December 2022 and January 2024, ******** was the online learning platform reporting the highest traffic, with a peak of *** million visits to its websites in December 2023. ******** ranked second, with the platform reaching a peak of ** million visits in the examined period. The website ******* (which stands for technology, entertainment, design) saw a peak of over ** million visits in March 2023.
As of July 2025, mobile phones accounted for **** percent of web page views in Saudi Arabia. The United Arab Emirates ranked second, with mobile devices generating approximately ***** percent of web traffic. Poland, Portugal, and Malaysia saw less than ** percent of their national internet traffic coming from mobile devices. Additionally, Russia ranked last for mobile internet traffic as of the middle of 2025, as ***** percent of the total internet traffic in the country came from smartphones and internet connected mobile devices.
Comprehensive analysis of Amazon's daily website traffic including visitor counts, traffic sources, mobile vs desktop usage, and seasonal patterns based on May 2025 data.
30
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
In January 2025 mobile devices excluding tablets accounted for over ** percent of web page views worldwide. Meanwhile, over ** percent of webpage views in Africa were generated via mobile. In contrast, just over half of web traffic in North America still took place via desktop connections with mobile only accounting for **** percent of total web traffic. While regional infrastructure remains an important factor in broadband vs. mobile coverage, most of the world has had their eyes on the recent 5G rollout across the globe, spearheaded by tech-leaders China and the United States. The number of mobile 5G subscriptions worldwide is forecast to reach more than ***** billion by 2028. Social media: room for growth in Africa and southern Asia Overall, more than ** percent of the world’s mobile internet subscribers are also active on social media. A fast-growing market, with newcomers such as TikTok taking the world by storm, marketers have been cashing in on social media’s reach. Overall, social media penetration is highest in Europe and America while in Africa and southern Asia, there is still room for growth. As of 2021, Facebook and Google-owned YouTube are the most popular social media platforms worldwide. Facebook and Instagram are most effective With nearly ***** billion users, it is no wonder that Facebook remains the social media avenue of choice for the majority of marketers across the world. Instagram, meanwhile, was the second most popular outlet. Both platforms are low-cost and support short-form content, known for its universal consumer appeal and answering to the most important benefits of using these kind of platforms for business and advertising purposes.
Through its Employment and Financial Services (EFS) division, Seniors, Community and Social Services’ (SCSS) programs form a strong foundation of support to help many Albertans find and keep jobs. The ministry provides financial support, employment services, career resources, referrals, information on job fairs and workshops, and local labor market information. The goal is to help individuals and families gain independence by providing opportunities to enhance their skills to get jobs. The alis.alberta.ca website provides employment resources to help Albertans enhance their employability, plan for education and training, make informed career choices, and connect to and be successful in the labour market. This dataset provides information on web traffic statistics for the alis website, including information on pageviews and web sessions, demographic information for web sessions, and traffic information for the alis YouTube channel (https://www.youtube.com/user/ALISwebsite).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed, interconnected banking transaction records, capturing sender and receiver relationships, transaction metadata, and anomaly flags. Designed for network analytics, it enables advanced anti-money laundering (AML) detection, fraud analysis, and financial behavior modeling by representing transactions as a directed graph. The flat structure ensures easy integration with machine learning and graph analytics tools.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.
The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.
Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.
Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.
Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.
Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:
ID: Unique identifier
SRC_IP: Source IP address
DST_IP: Destination IP address
DST_ASN: Destination Autonomous System number
SRC_PORT: Source port
DST_PORT: Destination port
PROTOCOL: Transport protocol
FLAG_CWR: Presence of the CWR flag
FLAG_CWR_REV: Presence of the CWR flag in the reverse direction
FLAG_ECE: Presence of the ECE flag
FLAG_ECE_REV: Presence of the ECE flag in the reverse direction
FLAG_URG: Presence of the URG flag
FLAG_URG_REV: Presence of the URG flag in the reverse direction
FLAG_ACK: Presence of the ACK flag
FLAG_ACK_REV: Presence of the ACK flag in the reverse direction
FLAG_PSH: Presence of the PSH flag
FLAG_PSH_REV: Presence of the PSH flag in the reverse direction
FLAG_RST: Presence of the RST flag
FLAG_RST_REV: Presence of the RST flag in the reverse direction
FLAG_SYN: Presence of the SYN flag
FLAG_SYN_REV: Presence of the SYN flag in the reverse direction
FLAG_FIN: Presence of the FIN flag
FLAG_FIN_REV: Presence of the FIN flag in the reverse direction
TLS_SNI: Server Name Indication domain
TLS_JA3: JA3 fingerprint of TLS client
TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff
TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff
DURATION: Duration of the flow in seconds
BYTES: Number of transmitted bytes from client to server
BYTES_REV: Number of transmitted bytes from server to client
PACKETS: Number of packets transmitted from client to server
PACKETS_REV: Number of packets transmitted from server to client
PPI: Packet sequence in the format: [[inter-packet times], [packet directions], [packet sizes], [push flags]]
PPI_LEN: Number of packets in the PPI sequence
PPI_DURATION: Duration of the PPI sequence in seconds
PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence
PHIST_SRC_SIZES: Histogram of packet sizes from client to server
PHIST_DST_SIZES: Histogram of packet sizes from server to client
PHIST_SRC_IPT: Histogram of inter-packet times from client to server
PHIST_DST_IPT: Histogram of inter-packet times from server to client
APP: Web service label
CATEGORY: Service category
FLOW_ENDREASON_IDLE: Flow was terminated because it was idle
FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout
FLOW_ENDREASON_END: Flow ended with the TCP connection termination
FLOW_ENDREASON_OTHER: Flow was terminated for other reasons
https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
Mobile browsers account for over 62% of global internet traffic, and mobile-first habits now shape how consumers engage with web content. In industries like e-commerce and digital publishing, this shift is transforming how brands optimize UX and prioritize performance. Automotive brands, for instance, build mobile‑friendly configurators, while media outlets tailor...
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.