https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
This data set contains internet traffic data captured by an Internet Service Provider (ISP) using Mikrotik SDN Controller and packet sniffer tools. The data set includes traffic from over 2000 customers who use Fibre to the Home (FTTH) and Gpon internet connections. The data was collected over a period of several months and contains all traffic in its original format with headers and packets.
The data set contains information on inbound and outbound traffic, including web browsing, email, file transfers, and more. The data set can be used for research in areas such as network security, traffic analysis, and machine learning.
**Data Collection Method: ** The data was captured using Mikrotik SDN Controller and packet sniffer tools. These tools capture traffic data by monitoring network traffic in real-time. The data set contains all traffic data in its original format, including headers and packets.
**Data Set Content: ** The data set is provided in a CSV format and includes the following fields:
MAC Protocol Examples 802.2 - 802.2 Frames (0x0004) arp - Address Resolution Protocol (0x0806) homeplug-av - HomePlug AV MME (0x88E1) ip - Internet Protocol version 4 (0x0800) ipv6 - Internet Protocol Version 6 (0x86DD) ipx - Internetwork Packet Exchange (0x8137) lldp - Link Layer Discovery Protocol (0x88CC) loop-protect - Loop Protect Protocol (0x9003) mpls-multicast - MPLS multicast (0x8848) mpls-unicast - MPLS unicast (0x8847) packing-compr - Encapsulated packets with compressed IP packing (0x9001) packing-simple - Encapsulated packets with simple IP packing (0x9000) pppoe - PPPoE Session Stage (0x8864) pppoe-discovery - PPPoE Discovery Stage (0x8863) rarp - Reverse Address Resolution Protocol (0x8035) service-vlan - Provider Bridging (IEEE 802.1ad) & Shortest Path Bridging IEEE 802.1aq (0x88A8) vlan - VLAN-tagged frame (IEEE 802.1Q) and Shortest Path Bridging IEEE 802.1aq with NNI compatibility (0x8100)
**Data Usage: ** The data set can be used for research in areas such as network security, traffic analysis, and machine learning. Researchers can use the data to develop new algorithms for detecting and preventing cyber attacks, analyzing internet traffic patterns, and more.
**Data Availability: ** If you are interested in using this data set for research purposes, please contact us at asfandyar250@gmail.com for more information and references. The data set is available for download on Kaggle and can be accessed by researchers who have obtained permission from the ISP.
We hope this data set will be useful for researchers in the field of network security and traffic analysis. If you have any questions or need further information, please do not hesitate to contact us.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5985737%2F61c81ce9eb393f8fc7c15540c9819b95%2FData.PNG?generation=1683750473536727&alt=media" alt="">
You can use Wireshark or other software's to view files
By 2030, the average mobile data connection was forecast to generate almost ** gigabytes of traffic per month in the Middle East and North Africa (MENA), increasing from *** gigabytes in 2023. The monthly mobile data traffic per subscriber has experienced a considerable growth from *** gigabytes in 2018.
Based on its user traffic value of **** billion yuan, WeChat ranked first among all Chinese mobile applications as of March 2024. In contrast, QQ, Tencent's other instant message app, generated a user traffic value of less than ** billion yuan.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
In March 2024, the video platform YouTube reported around 32.5 billion visits from global users. Meta-owned Facebook.com reported around 16.1 billion visits from global users, as Instagram.com and Twitter.com followed, each with 7 billion and 6.1 billion visits from users worldwide during the examined month. Wikipedia.org, which hosts users-generated encyclopedic entries, recorded around 4.4 billion visits, while news aggregator and community platform Reddit.com saw approximately 2.2 billion visits during the examined period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
You can also access an API version of this dataset.
TMS
(traffic monitoring system) daily-updated traffic counts API
Important note: due to the size of this dataset, you won't be able to open it fully in Excel. Use notepad / R / any software package which can open more than a million rows.
Data reuse caveats: as per license.
Data quality
statement: please read the accompanying user manual, explaining:
how
this data is collected identification
of count stations traffic
monitoring technology monitoring
hierarchy and conventions typical
survey specification data
calculation TMS
operation.
Traffic
monitoring for state highways: user manual
[PDF 465 KB]
The data is at daily granularity. However, the actual update
frequency of the data depends on the contract the site falls within. For telemetry
sites it's once a week on a Wednesday. Some regional sites are fortnightly, and
some monthly or quarterly. Some are only 4 weeks a year, with timing depending
on contractors’ programme of work.
Data quality caveats: you must use this data in
conjunction with the user manual and the following caveats.
The
road sensors used in data collection are subject to both technical errors and
environmental interference.Data
is compiled from a variety of sources. Accuracy may vary and the data
should only be used as a guide.As
not all road sections are monitored, a direct calculation of Vehicle
Kilometres Travelled (VKT) for a region is not possible.Data
is sourced from Waka Kotahi New Zealand Transport Agency TMS data.For
sites that use dual loops classification is by length. Vehicles with a length of less than 5.5m are
classed as light vehicles. Vehicles over 11m long are classed as heavy
vehicles. Vehicles between 5.5 and 11m are split 50:50 into light and
heavy.In September 2022, the National Telemetry contract was handed to a new contractor. During the handover process, due to some missing documents and aged technology, 40 of the 96 national telemetry traffic count sites went offline. Current contractor has continued to upload data from all active sites and have gradually worked to bring most offline sites back online. Please note and account for possible gaps in data from National Telemetry Sites.
The NZTA Vehicle
Classification Relationships diagram below shows the length classification (typically dual loops) and axle classification (typically pneumatic tube counts),
and how these map to the Monetised benefits and costs manual, table A37,
page 254.
Monetised benefits and costs manual [PDF 9 MB]
For the full TMS
classification schema see Appendix A of the traffic counting manual vehicle
classification scheme (NZTA 2011), below.
Traffic monitoring for state highways: user manual [PDF 465 KB]
State highway traffic monitoring (map)
State highway traffic monitoring sites
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
reddit.com is ranked #5 in US with 4.66B Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Autonomous Vehicles Navigation: The "Carla traffic dataset" can be used to develop and improve algorithms for autonomous vehicles, enabling them to effectively identify other road users, traffic lights, and various traffic signs, improving the cars’ ability to navigate safely in different weather conditions including fog.
Traffic Management Systems: The dataset could be leveraged to create advanced traffic management systems, identifying car, bike, or pedestrian movement, detecting traffic light states, and understanding if road users respect speed limits (30, 60, 90 km/h signs). This could improve urban traffic flow and increase overall road safety.
Driver Assistance Systems: The dataset could be used to develop advanced driver assistance systems (ADAS) that could alert drivers of pedestrians, other vehicles, traffic signs, and the status of traffic lights, particularly in foggy or difficult conditions.
Safety Testing for Vehicle Manufacturers: Companies manufacturing cars, bikes, or motorbikes could use the data to carry out safety testing under different situations, including different weather conditions and traffic light changes.
Virtual Driving Simulation: Game developers or driving schools could use this model to develop realistic driving simulations. The players or trainee drivers would need to respond correctly and promptly to real-world traffic situations like recognizing speed signs, traffic lights, and other road users.
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of more than 2300 trajectories of pedestrians and 1000 trajectories of cyclists recorded by a research vehicle of the University of Applied Sciences Aschaffenburg (Kooperative Automatisierte Verkehrssysteme) in urban traffic. In addition to the actual trajectory, the data set contains 3D poses, a representation of the body posture in three-dimensional space, and semantic maps describing the surrounding of the respective vulnerable road user (VRU).
The trajectories were sampled using a sliding window approach and split into a training, validation, and test dataset. Each sample contains the trajectory, 3D poses and semantic maps of the past second, as well as the sought future trajectory and semantic maps for the future 2.52 s. In addition, each pattern is assigned to a current type of motion. The motion types were annotated manually. For a more detailed description of the dataset, please refer to the following publication:
Viktor Kress, Fabian Jeske, Stefan Zernetsch, Konrad Doll, Bernhard Sick: Pose and Semantic Map Based Probabilistic Forecast of Vulnerable Road Users' Trajectories. 2021, arXiv: 2106.02598, https://arxiv.org/abs/2106.02598
We provide files for the training/validation dataset and the test dataset for pedestrians and cyclists, respectively. To read the provided data, unzip the files first. Each file contains a zarr directory. Zarr is a format for the storage of chunked, compressed, N-dimensional arrays (https://zarr.readthedocs.io). To read the data:
import zarr
data = zarr.open(
Each zarr directory contains the following keys:
Key:
pre_trajectories_and_poses: input trajectories of 13 body joint positions, format: [sample, timestep, x,y,z coordinates (first 13 coordinates: x, 14- 26: y, 27:39: z)]
pre_smaps: input semantic maps, format: [sample, timestep (-0.96s, -0.48a, 0.00s)], codes: static obstacles: 0, dynamic obstacles: 1, person: 2, sidewalk: 3, road: 4, walkable vegetation: 5, unknown obstacle: 6, unknown free space: 7, unkown: 8
pos_trajectories: ground truth future trajectories of the head, format: [sample, x,y coordinates (first 63 coordinates: x, 64- 126: y for the timesteps +0.04s, +0.08s, ..., +2.52s))]
pos_smaps: future semantic maps, format: [sample, timestep (+0.44s, +0.96s, +1.48s, +2.00s, 2.52s)]
fold: affiliation to training/validation dataset, format: [sample], codes: test set: 0, validation set: 1, training set: 2
augmentation: affiliation to the augmentation loop (0-2), format: [sample]
move, start, stop, wait, tl, tr: current motion type as boolean arrays, format: [sample]
This work was supported by “Zentrum Digitalisierung.Bayern”. In addition, the work is backed by the project DeCoInt2 , supported by the German Research Foundation (DFG) within the priority program SPP 1835: “Kooperativ interagierende Automobile”, grant numbers DO 1186/1-2 and SI 674/11-2.
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
amazon.com is ranked #3 in US with 2.82B Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
faire.com is ranked #2885 in US with 5.89M Traffic. Categories: Retail. Learn more about website traffic, market share, and more!
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
chatgpt.com is ranked #10 in US with 5.24B Traffic. Categories: AI. Learn more about website traffic, market share, and more!
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The retail people counting market, valued at $1556 million in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 8.7% from 2025 to 2033. This expansion is fueled by several key factors. Firstly, the increasing adoption of advanced technologies like AI-powered video analytics, WiFi and Bluetooth sensing, and infrared sensors provides retailers with more accurate and granular data on customer traffic patterns. This allows for optimized store layouts, staffing levels, and marketing campaigns, leading to improved operational efficiency and enhanced customer experiences. Secondly, the growing demand for data-driven decision-making across the retail sector is driving the adoption of people counting systems. Retailers are increasingly realizing the importance of understanding customer behavior to personalize their offerings and optimize their strategies for better profitability. Finally, the increasing availability of affordable and user-friendly people counting solutions, including cloud-based platforms and mobile applications, is making this technology accessible to a wider range of businesses, from small and medium-sized enterprises (SMEs) to large multinational corporations. While the market faces challenges such as the initial investment costs associated with implementing these systems and concerns about data privacy, these are being mitigated by the long-term return on investment (ROI) generated through optimized operations and improved sales conversions. The market is segmented by application (SMEs and large enterprises) and technology (Wi-Fi and Bluetooth sensing, video-based counting, infrared sensors, time-of-flight sensors, and others). Key players in the market, including V-Count, Visionarea, Beonic (Blix), Retail Next, and ShopperTrak, are constantly innovating and expanding their product offerings to cater to the evolving needs of retailers. The competitive landscape is dynamic, with ongoing mergers, acquisitions, and the development of new technologies driving market evolution. The continued focus on enhancing the customer experience and leveraging data analytics will ensure sustained growth in the retail people counting market throughout the forecast period.
In 2023, an average mobile subscriber in Latin America generated ***** gigabytes of data traffic per month. It has been projected that the number will increase to ** gigabytes by 2030. The number of unique mobile subscribers is expected to increase from *** to *** million between 2023 and 2030.
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
perplexity.ai is ranked #89 in IN with 173.58M Traffic. Categories: AI. Learn more about website traffic, market share, and more!
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
craigslist.org is ranked #73 in US with 125.7M Traffic. Categories: Online Services, Real Estate. Learn more about website traffic, market share, and more!
This dataset contains user behavior traffic in Tor, I2P, ZeroNet and Freenet.
We divide darknet user behaviors in 8 categories: Browsing, Chat, E-mail, Audio-streaming, Video-streaming, File Transfer, P2P and VoIP. We investigated the commonly used applications in Tor, I2P, ZeroNet, Freenet to simulate various user behaviors.
After capturing pcap, we use CICFlowMeter for feature extraction. Since our user behavior hierarchical classifier consists of 6 local classifiers, we divide the dataset into 6 csv files. The statistics of traffic data are shown in the following table.
Browsing | Chat | File Transfer | P2P | Audio | Video | VoIP | Total | ||
---|---|---|---|---|---|---|---|---|---|
Tor | 1281 | 841 | 553 | 1077 | 1018 | 1567 | 1703 | 592 | 8632 |
I2P | 1921 | 442 | 1084 | 1791 | 2910 | - | - | - | 8148 |
ZeroNet | 7972 | 1531 | 352 | 2157 | 1394 | 820 | 1251 | - | 15477 |
Freenet | 4990 | 1123 | 2980 | 4897 | - | - | 2397 | - | 16387 |
Original Pcap Datasets: Google drive - download
@INPROCEEDINGS{9343185,
author={Hu, Yuzong and Zou, Futai and Li, Linsen and Yi, Ping},
booktitle={2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)},
title={Traffic Classification of User Behaviors in Tor, I2P, ZeroNet, Freenet},
year={2020},
volume={},
number={},
pages={418-424},
doi={10.1109/TrustCom50675.2020.00064}}
Foto von Leon Seibert auf Unsplash
As of March 2025, smartphone users with a 5G data plan in South Korea used around **** GB per subscription in that month. The number of 5G subscribers in the country had reached around ***** million users in the same month that year.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.