Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
As of the last quarter of 2023, 31.57 percent of web traffic in the United States originated from mobile devices, down from 49.51 percent in the fourth quarter of 2022. In comparison, over half of web traffic worldwide was generated via mobile in the last examined period.
In 2024, most of the global website traffic was still generated by humans, but bot traffic is constantly growing. Fraudulent traffic through bad bot actors accounted for 37 percent of global web traffic in the most recently measured period, representing an increase of 12 percent from the previous year. Sophistication of Bad Bots on the rise The complexity of malicious bot activity has dramatically increased in recent years. Advanced bad bots have doubled in prevalence over the past 2 years, indicating a surge in the sophistication of cyber threats. Simultaneously, the share of simple bad bots drastically increased over the last years, suggesting a shift in the landscape of automated threats. Meanwhile, areas like food and groceries, sports, gambling, and entertainment faced the highest amount of advanced bad bots, with more than 70 percent of their bot traffic affected by evasive applications. Good and bad bots across industries The impact of bot traffic varies across different sectors. Bad bots accounted for over 50 percent of the telecom and ISPs, community and society, and computing and IT segments web traffic. However, not all bot traffic is considered bad. Some of these applications help index websites for search engines or monitor website performance, assisting users throughout their online search. Therefore, areas like entertainment, food and groceries, and even areas targeted by bad bots themselves experienced notable levels of good bot traffic, demonstrating the diverse applications of benign automated systems across different sectors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Digital technology and Internet use, website traffic strategies, by North American Industry Classification System (NAICS) and size of enterprise for Canada from 2012 to 2013.
https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy
Tor Statistics: Tor is a free network that helps people stay anonymous online. It works using open-source software and depends on more than 7,000 volunteer-run relays across the world. Tor, known as “The Onion Router,†is a free tool that helps protect your identity and activity online.
It helps in hiding location and internet use by passing your data through many different servers, called relays, run by volunteers around the globe. Tor is built on open-source software and is widely used by journalists, activists, and everyday users who value their privacy. It was developed by the Tor Project and initially released on September 20, 2002.
This article includes several statistical analyses from different sources covering the overall market trend, features, types, user bases, demographics, countries, traffic shares, and many other factors.
Web traffic statistics for the top 2000 most visited pages on nyc.gov by month.
In June 2025, DoorDash's website, doordash.com, had just under 72 million visitors globally, recording a bounce rate of approximately 34.2 percent. For comparison, web traffic figures of UberEats show lower monthly visits.
30
Here we deposit the datasets we have extracted for ten states in the US. In each zip file, we include each state's accident records, road networks, and network features. For further information about using the dataset and how we extracted the data, check out our GitHub repository for instructions.
As of January 2024, mobile phones accounted for ** percent of web page views in Nigeria. Vietnam ranked second, with mobile devices generating approximately **** percent of web traffic. Belgium, Portugal, and Canada, saw less than ** percent of their national internet traffic coming from mobile devices. Additionally, Japan ranked last for mobile internet traffic as of the beginning of 2024, **** percent of the total internet traffic in the country came from smartphones and internet connected mobile devices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:
W-2022-44
Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45
Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46
Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47
Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22
Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M
Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:
ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons
Link to other CESNET datasets
https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:
@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
Context There's a story behind every dataset and here's your opportunity to share yours.
Content What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is collected by the inductive loop detectors deployed on freeways in Seattle area. The freeways contain I-5, I-405, I-90, and SR-520. This data set contains spatiotemporal speed information of the freeway system. At each milepost, the speed information collected from main lane loop detectors in the same direction are averaged and integrated into 5 minutes interval speed data. The raw data is provided by Washington Start Department of Transportation (WSDOT) and processed by the STAR Lab in the University of Washington according to data quality control and data imputation procedures [1][2].
The data file is a pickle file that can be easily read using the read_pickle() function in the Pandas package. The data forms as a matrix and each cell of the matrix is speed value for the specific milepost and time period. The horizontal header of the data set denotes the milepost and the vertical header indicates the timestamps. For more information on the definition of milepost, please refer to this website.
This data set been used for traffic prediction tasks in several research studies [3][4]. For more detailed information about the data set, you can also refer to this link.
References:
[1]. Henrickson, K., Zou, Y., & Wang, Y. (2015). Flexible and robust method for missing loop detector data imputation. Transportation Research Record, 2527(1), 29-36.
[2]. Wang, Y., Zhang, W., Henrickson, K., Ke, R., & Cui, Z. (2016). Digital roadway interactive visualization and evaluation network applications to WSDOT operational data usage (No. WA-RD 854.1). Washington (State). Dept. of Transportation.
[3]. Cui, Z., Ke, R., & Wang, Y. (2018). Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143.
[4]. Cui, Z., Henrickson, K., Ke, R., & Wang, Y. (2018). Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting. arXiv preprint arXiv:1802.07007.
Comprehensive analysis of Amazon's daily website traffic including visitor counts, traffic sources, mobile vs desktop usage, and seasonal patterns based on May 2025 data.
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file contains a dump of the neo4j instance of the road network of the city of Modena. The database contains both the primal graph and the dual graph and integrates traffic volume data and Points Of Interest (POI). The database is generated from Open Street Map data exploiting the code in this git repository. The git repository shows how to employ the graph database for analysis, routing, and the simulation of road closure scenarios.
In 2024, the single country with the highest share of online traffic to shein.com was the United States. Brazil followed, ranking second with about ** percent of shein.com web traffic coming from the country, followed by France with *** percent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Puff Bar, a disposable electronic nicotine delivery system (ENDS), was the ENDS brand most commonly used by U.S. youth in 2021. We explored whether Puff Bar’s rise in marketplace prominence was detectable through advertising, retail sales, social media, and web traffic data sources. We retrospectively documented potential signals of interest in and uptake of Puff Bar in the United States using metrics based on advertising (Numerator and Comperemedia), retail sales (NielsenIQ), social media (Twitter, via Sprinklr), and web traffic (Similarweb) data from January 2019 to June 2022. We selected metrics based on (1) data availability, (2) potential to graph metric longitudinally, and (3) variability in metric. We graphed metrics and assessed data patterns compared to data for Vuse, a comparator product, and in the context of regulatory events significant to Puff Bar. The number of Twitter posts that contained a Puff Bar term (social media), Puff Bar product sales measured in dollars (sales), and the number of visits to the Puff Bar website (web traffic) exhibited potential for surveilling Puff Bar due to ease of calculation, comprehensibility, and responsiveness to events. Advertising tracked through Numerator and Comperemedia did not appear to capture marketing from Puff Bar’s manufacturer or drive change in marketplace prominence. This study demonstrates how quantitative changes in metrics developed using advertising, retail sales, social media, and web traffic data sources detected changes in Puff Bar’s marketplace prominence. We conclude that low-effort, scalable, rapid signal detection capabilities can be an important part of a multi-component tobacco surveillance program.
https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy
Buffer vs Hootsuite Statistics: Buffer and Hootsuite are working against each other for supremacy in scheduling, analytics, collaboration, and affordability. Buffer offers simple interfaces and transparent pricing for creators and small and medium enterprises. Hootsuite markets to larger enterprises for deep analytics, bulk management tools, and integrations comprising its internal operations.
This comprehensive Buffer vs Hootsuite statistics focuses on user growth, satisfaction, pricing, features, and ease of use, all backed by numbers and insights from various sources. So, by the end, you will have an informative and data-driven sense of what platform will suit your needs better.
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.