https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:
W-2022-44
Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45
Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46
Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47
Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22
Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M
Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:
ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons
Link to other CESNET datasets
https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:
@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
This dataset is comprised of NetFlow records, which capture the outbound network traffic of 8 commercial IoT devices and 5 non-IoT devices, collected during a period of 37 days in a lab at Ben-Gurion University of The Negev. The dataset was collected in order to develop a method for telecommunication providers to detect vulnerable IoT models behind home NATs. Each NetFlow record is labeled with the device model which produced it; for research reproducibilty, each NetFlow is also allocated to either the "training" or "test" set, in accordance with the partitioning described in:
Y. Meidan, V. Sachidananda, H. Peng, R. Sagron, Y. Elovici, and A. Shabtai, A novel approach for detecting vulnerable IoT devices connected behind a home NAT, Computers & Security, Volume 97, 2020, 101968, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2020.101968. (http://www.sciencedirect.com/science/article/pii/S0167404820302418)
Please note:
# NetFlow features, used in the related paper for analysis
'FIRST_SWITCHED': System uptime at which the first packet of this flow was switched
'IN_BYTES': Incoming counter for the number of bytes associated with an IP Flow
'IN_PKTS': Incoming counter for the number of packets associated with an IP Flow
'IPV4_DST_ADDR': IPv4 destination address
'L4_DST_PORT': TCP/UDP destination port number
'L4_SRC_PORT': TCP/UDP source port number
'LAST_SWITCHED': System uptime at which the last packet of this flow was switched
'PROTOCOL': IP protocol byte (6: TCP, 17: UDP)
'SRC_TOS': Type of Service byte setting when there is an incoming interface
'TCP_FLAGS': Cumulative of all the TCP flags seen for this flow
# Features added by the authors
'IP': Prefix of the destination IP address, representing the network (without the host)
'DURATION': Time (seconds) between first/last packet switching
# Label
'device_model':
# Partition
'partition': Training or test
# Additional NetFlow features (mostly zero-variance)
'SRC_AS': Source BGP autonomous system number
'DST_AS': Destination BGP autonomous system number
'INPUT_SNMP': Input interface index
'OUTPUT_SNMP': Output interface index
'IPV4_SRC_ADDR': IPv4 source address
'MAC': MAC address of the source
# Additional data
'category': IoT or non-IoT
'type': IoT, access_point, smartphone, laptop
'date': Datepart of FIRST_SWITCHED
'inter_arrival_time': Time (seconds) between successive flows of the same device (identified by its MAC address)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
NetFlow flows have been captured with different sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.
The version of NetFlow used to build the datasets is 5.
The UNSW IoT traffic data (UNSW-IoTraffic) is a dataset comprising (a) raw network packet traces with full headers and payload, (b) flow-level metadata summarizing fine-grained bidirectional activity behaviors, and (c) protocol parameters describing network protocol characteristics. The dataset also includes scripts written in Java for flow extraction and protocol matching using protocol data models, along with data models for six dominant protocols (TLS, HTTP, DNS, DHCP, SSDP, and NTP). The dataset contains 95.5 million packets of IoT communications captured over 203 days, organized into 27 per-device packet capture (PCAP) files. Derived flow data, categorized based on the 5-tuple attributes (source IP address, destination IP address, protocol number, source port number, destination port number), are provided as 27 per-device CSV files. Additionally, protocol-specific parameters for 70% flows are extracted into a total of 450 CSV files across 27 device types, covering 25 pr..., , # UNSW IoT traffic data with packets, flows, and protocols
Dataset DOI: 10.5061/dryad.w0vt4b94b
UNSW-IoTraffic is a multi-resolution network traffic dataset of consumer IoT devices captured from a lab testbed. It includes device-specific raw PCAPs (full headers and payloads), flow-level CSVs (bidirectional 5-tuple flows with statistics), and protocol-parameter CSVs (request/response attributes for selected protocols). The capture covers 27 devices, spans ≈203 days of operation (setup, idle, and interactions), and totals 95,543,405 packets, 4,944,041 flows, and ≈26.9 GB of PCAPs. All timestamps are recorded in UTC.
We provide five ZIP archives so you can fetch only what you need:
.pcap
)MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed information about network packets captured using Wireshark on a Windows machine through the Wi-Fi interface. The data represents various protocols such as MDNS, SSDP, and DHCP, commonly involved in local network service discovery and IP address management. It is ideal for network performance analysis, anomaly detection, or cybersecurity research.
Device: Windows Laptop
Interface: Wireless (Wi-Fi)
Tool Used: Wireshark
Format: CSV (custom-exported from packet analysis)
Time Period: Epoch timestamps (Unix time format)
Column Name | Description |
---|---|
time | Epoch timestamp when the packet was captured |
src_ip | Source IP address (converted to integer) |
dst_ip | Destination IP address (converted to integer) |
protocol | Network protocol used (e.g., MDNS, SSDP, DHCP) |
packet_length | Size of the packet in kilobytes |
tcp_src_port | TCP source port number |
tcp_dst_port | TCP destination port number |
ttl | Time to Live – how many hops a packet can take before being dropped |
tcp_flags | TCP control flags (bitfield) – indicates connection control info |
window_size | TCP window size for flow control |
ack_rtt | Round-trip time observed for the acknowledgment |
retransmission | Indicates whether the packet is a retransmission (binary: 0 or 1) |
time_delta | Time difference between current and previous packet |
avg_latency | Average latency over the session |
jitter | Variation in packet delay |
total_data | Total data exchanged in the session (in KB) |
session_duration | Duration of the session in seconds |
bandwidth | Estimated bandwidth used in the session (KB/s) |
spike_anomaly | Indicates whether the packet is part of an anomalous spike (binary: 0 or 1) |
This graph shows the forecast growth in global data center internet protocol (IP) traffic from 2013 to 2021. Data center IP traffic is expected to grow steadily, reaching around **** zettabytes per year by 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by a LoRaWAN sniffer and contains packets, which are thoroughly analyzed in the paper Exploring LoRaWAN Traffic: In-Depth Analysis of IoT Network Communications (not yet published). Data from the LoRaWAN sniffer was collected in four cities: Liege (Belgium), Graz (Austria), Vienna (Austria), and Brno (Czechia).
Gateway ID: b827ebafac000001
Gateway ID: b827ebafac000002
Gateway ID: b827ebafac000003
To open the pcap
files, you need Wireshark with current support for LoRaTap and LoRaWAN protocols. This support will be available in the official 4.1.0 release. A working version for Windows is accessible in the automated build system.
The source data is available in the log.zip
file, which contains the complete dataset obtained by the sniffer. A set of conversion tools for log processing is available on Github. The converted logs, available in Wireshark format, are stored in pcap.zip
. For the LoRaWAN decoder, you can use the attached root and session keys. The processed outputs are stored in csv.zip
, and graphical statistics are available in png.zip
.
This data represents a unique, geographically identifiable selection from the full log, cleaned of any errors. The records from Brno include communication between the gateway and a node with known keys.
Test file :: 00_Test
Brno, Czech Republic :: 01_Brno
70b3d5cee0000042
d494d49a7b4053302bdcf96f1defa65a
00d85395
c417540b8b2afad8930c82fcf7ea54bb
421fea9bedd2cc497f63303edf5adf8e
Liege, Belgium :: 02_Liege
:: evaluated in the paper
Brno, Czech Republic :: 03_Brno_join
70b3d5cee0000042
d494d49a7b4053302bdcf96f1defa65a
01e65ddc
e2898779a03de59e2317b149abf00238
59ca1ac91922887093bc7b236bd1b07f
Graz, Austria :: 04_Graz
:: evaluated in the paper
Vienna, Austria :: 05_Wien
:: evaluated in the paper
Brno, Czech Republic :: 07_Brno
:: evaluated in the paper
Documentation for Network Traffic Dataset
Dataset Overview
This dataset consists of network traffic captured from a Kali Linux machine, aimed at helping the development and evaluation of machine learning models for distinguishing between normal and malicious (specifically flood attack) network activities. It includes a variety of features essential for identifying potential cybersecurity threats alongside labels indicating whether each packet is part of flood traffic.
Data Collection Methodology
The dataset was carefully compiled using network traffic captured from a dedicated Kali Linux setup. The capture environment consisted of a Kali Linux machine configured to generate and capture both normal and malicious network traffic and a target machine running a Windows OS to simulate a real-world network environment.
Traffic Generation:
Normal Traffic: Involved routine network activities such as web browsing and pinging between the Kali Linux machine and the Windows machine.
Malicious Traffic: Utilized hping3 to simulate flood attacks, specifically ICMP flood attacks, targeting the Windows machine from the Kali Linux machine [1].
Capture Process: Wireshark was used on the Kali Linux machine to capture all incoming and outgoing network traffic [2]. The capture was set up to record detailed packet information, including timestamps, source and destination IP addresses, ports, and protocols. The captures were conducted with careful monitoring to precisely mark the start and end times of the flood attack for accurate dataset labeling.
Dataset Description
The dataset is a CSV file containing a comprehensive collection of network traffic packets labeled to distinguish between normal and malicious traffic. It includes the following columns:
Timestamp: The capture time of each packet, providing insights into the traffic flow and enabling analysis of traffic patterns over time. Source IP Address: Identifies the origin of the packet, crucial for pinpointing potential sources of attacks. Destination IP Address: Indicates the packet's intended recipient, useful for identifying targeted resources. Source Port and Destination Port: Offer insights into the services involved in the communication. Protocol: Specifies the protocol used, such as TCP, UDP, or ICMP, essential for analyzing the nature of the traffic. Length: The size of the packet in bytes, which can signal unusual traffic patterns often associated with malicious activities. bad_packet: A binary label with 1 indicating traffic identified as part of a flood attack and 0 denoting normal traffic. Precise timestamps marking the start and end of flood attacks were used to accurately label this column. Packets captured within these defined intervals were marked as malicious (bad_packet = 1), whereas all others were considered normal traffic. Python and Pandas were used for the labeling process [3][4].
Potential Applications
a. Intrusion Detection Systems (IDS): The dataset can be used in training models to enhance IDS capabilities, enabling more effective detection of flood-based network attacks. b. Network Traffic Monitoring: Tools making use of machine learning can leverage the dataset for more accurate network traffic monitoring, identifying and alerting suspicious activities in real time. c. Cybersecurity Training: Educational institutions and training programs can use the dataset to provide practical experience in machine learning-based threat detection.
Proposed Machine Learning Technique: Supervised Machine Learning, specifically Deep Learning with Convolutional Neural Networks (CNNs).
CNNs, even though it is usually used for image processing, have shown promise in analyzing sequential data. The spatial hierarchy in network packets (from individual bytes to overall packet structure) can be analogous to the patterns CNNs excel at identifying. Utilizing CNNs could allow for the extraction of complex data in network traffic that indicate malicious activities, improving detection accuracy beyond traditional methods.
Conclusion
This dataset represents a significant step towards using machine learning for cybersecurity, specifically in the field of intrusion detection and network monitoring. By providing a detailed and accurately labeled dataset of normal and malicious network traffic, it lays the groundwork for developing complex models capable of identifying and mitigating flood attacks in real-time. In the future, we could include a broader range of attack types and more traffic patterns, further enhancing the dataset's utility and the effectiveness of models trained on it.
References [1] https://linux.die.net/man/8/hping3 [2] https://www.wireshark.org/docs/ [3] https://pandas.pydata.org/docs/ [4] https://docs.python.org/3/tutorial/index.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Some network traffic protocols.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises network traffic collected from 24 Internet of Things (IoT) devices over a span of 119 days, capturing a total of over 110 million packets. The devices represent 19 distinct types and were monitored in a controlled environment under normal operating conditions, reflecting a variety of functions and behaviors typical of consumer IoT products (pcapIoT). The packet capture (pcap) files preserve complete packet information across all protocol layers, including ARP, TCP, HTTP, and various application-layer protocols. Raw pcap files (pcapFull) are also provided, which contain traffic from 36 non-IoT devices present in the network. To facilitate device-specific analysis, a CSV file is included that maps each IoT device to its unique MAC address. This mapping simplifies the identification and filtering of packets belonging to each device within the pcap files. 3 extra CSV (CSVs) files provide metadate about the states that the devices were in at different times. Additionally, Python scripts (Scripts) are provided to assist in extracting and processing packets. These scripts include functionalities such as packet filtering based on MAC addresses and protocol-specific data extraction, serving as practical examples for data manipulation and analysis techniques. This dataset is valuable for researchers interested in network behavior analysis, anomaly detection, and the development of IoT-specific network policies. It enables the study and differentiation of network behaviors based on device functions and supports behavior-based profiling to identify irregular activities or potential security threats.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Netflow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device. Netflow flows have been captured by sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued. In the construction of the datasets, different percentages of flows considered attacks and flows considered normal traffic have been used. These datasets have been used to test previously trained models.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global network protocol analyzer market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach around USD 2.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.5% during the forecast period. The growth factors propelling this market include the rising complexity of network infrastructures, the increasing adoption of cloud services, and heightened concerns over network security and performance monitoring.
One of the primary growth drivers for the network protocol analyzer market is the growing complexity and scale of modern network infrastructures. With the rise of the Internet of Things (IoT), 5G technologies, and the proliferation of connected devices, networks have become more intricate and expansive. This complexity necessitates advanced tools for monitoring, managing, and diagnosing network issues, thereby driving the demand for network protocol analyzers. Additionally, the ongoing digital transformation across various industries has further emphasized the need for robust network management solutions.
Another significant factor contributing to market growth is the increasing adoption of cloud services. As businesses migrate their operations to cloud environments, the need for effective network performance monitoring tools becomes critical. Cloud-based network protocol analyzers offer scalability, flexibility, and real-time analysis capabilities, making them indispensable for maintaining optimal network performance and security. Furthermore, the shift towards remote work and the growing reliance on cloud applications have underscored the importance of maintaining secure and efficient network operations.
Heightened concerns over network security are also playing a pivotal role in driving the market for network protocol analyzers. Cybersecurity threats are becoming more sophisticated, and organizations are increasingly focused on protecting their network infrastructures from breaches and attacks. Network protocol analyzers provide in-depth visibility into network traffic, enabling the detection of anomalies, potential threats, and vulnerabilities. This proactive approach to network security is crucial for safeguarding sensitive data and ensuring business continuity.
In the realm of network management, the role of a Network Traffic Monitor is becoming increasingly vital. As organizations strive to maintain optimal network performance, these monitors provide real-time insights into data flow across networks. They enable IT teams to identify and address potential bottlenecks, ensuring smooth and efficient operations. By analyzing traffic patterns, Network Traffic Monitors help in detecting anomalies that could indicate security threats or inefficiencies. This proactive approach not only enhances network reliability but also aids in capacity planning and resource allocation, making them indispensable tools in modern network infrastructures.
Regionally, the North American market is expected to dominate the network protocol analyzer market during the forecast period. This dominance can be attributed to the region's advanced technological landscape, high adoption rates of new technologies, and significant investments in cybersecurity. Moreover, the presence of major market players and a strong focus on research and development activities further bolster the market's growth in this region. However, other regions such as Asia Pacific are also witnessing substantial growth due to increasing digitization, expanding IT infrastructure, and rising awareness about network security and management solutions.
The network protocol analyzer market is segmented into hardware-based and software-based types. Hardware-based analyzers are physical devices that connect to network components to capture and analyze data packets. These devices are typically used in environments where high performance and low latency are critical. Hardware-based analyzers provide real-time data capture and analysis, making them ideal for large-scale network deployments and data centers. They offer robust functionalities but can be more expensive and less flexible compared to their software-based counterparts.
On the other hand, software-based analyzers are applications that can be installed on standard computing devices to monitor and analyze network traffic. These analyzers offer greater flexibility and scalability, making them sui
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented here was collected in a network section from Universidad Del Cauca, Popayán, Colombia by performing packet captures at different hours, during morning and afternoon, over six days (April 26, 27, 28 and May 9, 11 and 15) of 2017. A total of 3.577.296 instances were collected and are currently stored in a CSV (Comma Separated Values) file.
This dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on that flow as the class, among others. Most of the attributes are numeric type but there are also nominal types and a date type due to the Timestamp.
The flow statistics (IP addresses, ports, inter-arrival times, etc) were obtained using CICFlowmeter (http://www.unb.ca/cic/research/applications.html - https://github.com/ISCX/CICFlowMeter). The application layer protocol was obtained by performing a DPI (Deep Packet Inspection) processing on the flows with ntopng (https://www.ntop.org/products/traffic-analysis/ntop/ - https://github.com/ntop/ntopng).
For further information and if you find this dataset useful, please read and cite the following papers:
Springer: https://link.springer.com/chapter/10.1007/978-3-319-95168-3_37
IEEExplore https://ieeexplore.ieee.org/document/8845576
Research Gate: https://www.researchgate.net/publication/345990587_Smart_User_Consumption_Profiling_Incremental_Learning-based_OTT_Service_Degradation
IEEExpore https://ieeexplore.ieee.org/document/9258898
I would like to thank Universidad Del Cauca for supporting the research that generated this dataset and Colciencias for my PhD scholarship.
Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).
By the end of the third quarter of 2019, the percentage of internet traffic that came from Internet Protocol version * (IPv6) addresses in Brazil was at nearly **** percent, slightly under the **** percent registered at the end of the second quarter that same year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IEC 60870-5-104
Intrusion Detection Dataset
Readme File
ITHACA – University of Western Macedonia - https://ithaca.ece.uowm.gr/
Authors: Panagiotis Radoglou-Grammatikis, Thomas Lagkas, Vasileios Argyriou, Panagiotis Sarigiannidis
Publication Date: September 23, 2022
1.Introduction
The evolution of the Industrial Internet of Things (IIoT) introduces several benefits, such as real-time monitoring, pervasive control and self-healing. However, despite the valuable services, security and privacy issues still remain given the presence of legacy and insecure communication protocols like IEC 60870-5-104. IEC 60870-5-104 is an industrial protocol widely applied in critical infrastructures, such as the smart electrical grid and industrial healthcare systems. The IEC 60870-5-104 Intrusion Detection Dataset was implemented in the context of the research paper entitled "Modeling, Detecting, and Mitigating Threats Against Industrial Healthcare Systems: A Combined Software Defined Networking and Reinforcement Learning Approach" [1], in the context of two H2020 projects: ELECTRON: rEsilient and seLf-healed EleCTRical pOwer Nanogrid (101021936) and SDN-microSENSE: SDN - microgrid reSilient Electrical eNergy SystEm (833955). This dataset includes labelled Transmission Control Protocol (TCP)/Internet Protocol (IP) network flow statistics (Common-Separated Values (CSV) format) and IEC 60870-5-104 flow statistics (CSV format) related to twelve IEC 60870-5-104 cyberattacks. In particular, the cyberattacks are related to unauthorised commands and Denial of Service (DoS) activities against IEC 60870-5-104. Moreover, the relevant Packet Capture (PCAP) files are available. The dataset can be utilised for Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS), taking full advantage of Machine Learning (ML) and Deep Learning (DL).
2.Instructions
The IEC 60870-5-104 dataset was implemented following the methodology of A. Gharib et al. in [2], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.
A network topology consisting of (a) seven industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to construct the IEC 60870-5-104 Intrusion Detection Dataset. The industrial entities use IEC TestServer[1], while the HMI uses Qtester104[2]. On the other hand, the cyberattackers use Kali Linux[3] equipped with Metasploit[4], OpenMUC j60870[5] and Ettercap[6]. The cyberattacks were performed during the following days.
For each attack, a 7zip file is provided, including the network traffic and the network flow statistics for each entity. Moreover, a relevant diagram is provided, illustrating the corresponding cyberattack. In particular, for each entity, a folder is given, including (a) the relevant pcap file, (b) Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics in a Common Separated Value (CSV) format and (c) IEC 60870-5-104 flow statistics in a CSV format. The TCP/IP network flow statistics were generated by CICFlowMeter[7], while the IEC 60870-5-104 flow statistics were generated based on a Custom IEC 60870-5-104 Python Parser[8], taking full advantage of Scapy[9].
3.Dataset Structure
The dataset consists of the following files:
Each 7zip file includes respective folders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the overall network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the overall network traffic, (c) the IEC 60870-5-104 network traffic (pcap file) related to this entity/device during each attack, (d) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the IEC 608770-5-104 network traffic, (e) the IEC 60870-5-104 flow statistics (CSV file) from the Custom IEC 60870-5-104 Python Parser for the IEC 608770-5-104 network traffic and finally, (f) an image showing how the attack was executed. Finally, it is noteworthy that the network flow from both CICFlowMeter and Custom IEC 60870-5-104 Python Parser in each CSV file are labelled based on the IEC 60870-5-104 cyberattacks executed for the generation of this dataset. The description of these attacks is given in the following section, while the various features from CICFlowMeter and Custom IEC 60870-5-104 Python Parser are presented in Section 5.
4.Testbed & IEC 60870-5-104 Attacks
The testbed created for generating this dataset is composed of five virtual RTU devices emulated by IEC TestServer and two real RTU devices. Moreover, there is another workstation which plays the role of Master Terminal Unit (MTU) and HMI, sending legitimate IEC 60870-5-104 commands to the corresponding RTUs. For this purpose, the workstation uses QTester104. In addition, there are three attackers that act as malicious insiders executing the following cyberattacks against the aforementioned RTUs. Finally, the network traffic data of each entity/device was captured through tshark.
Table 1: IEC 60870-5-104 Cyberattacks Description
IEC 60870-5-104 Cyberattack Description |
Description |
Dataset Files |
MITM Drop |
During this attack, the cyberattacker is placed between two endpoints, thus monitoring and dropping the network traffic |
VLC Data: A Multi-Class Network Traffic Dataset Covering Diverse Applications and Platforms
Valencia Data (VLC Data) is a network traffic dataset collected from various applications and platforms. It includes both encrypted and, when applicable, unencrypted protocols, capturing realistic usage scenarios and application-specific behavior.
The dataset covers 18.5 hours, 58 pcapng files, and 24.26 GB, with traffic from:
Video streaming: Netflix and Prime Video (10–50 min) via Firefox.
Gaming: Roblox sessions on Windows (20–35 min), recorded outside of virtual machines, despite VM support.
Video conferencing: Microsoft Teams (20 min) via Firefox.
Web browsing: Wikipedia, BBC, Google, LinkedIn, Amazon, and OWIN6G (2–5 min) via Firefox or Chrome.
Audio streaming: Spotify (30–33 min) on multiple OS.
Web streaming: YouTube in 4K and Full HD (20–30 min).
This dataset is publicly available for traffic analysis across different apps, protocols, and systems.
Table Description:
Type Applications Platform Time [min] Comments Filename Size (MB)
Video Streaming Netflix Linux 10 Running Netflix on Firefox Browser netflix_linux_10m_01 95.1
Video Streaming Netflix Linux 20 Running Netflix on Firefox Browser netflix_linux_20m_01 167.7
Video Streaming Netflix Linux 20 Running Netflix on Firefox Browser netflix_linux_20m_02 237.9
Video Streaming Netflix Linux 20 Running Netflix on Firefox Browser netflix_linux_20m_03 212.6
Video Streaming Netflix Linux 25 Running Netflix on Firefox, but 2 min in Menu netflix_linux_25m_01 610.7
Video Streaming Netflix Linux 35 Running Netflix on Firefox, but 1 min in Menu netflix_linux_35m_01 534.8
Video Streaming Netflix Linux 50 Running Netflix on Firefox Browser netflix_linux_50m_01 660.9
Video Streaming Netflix Windows 10 Running Netflix on Firefox Browser netflix_windows_10m_01 132.1
Video Streaming Netflix Windows 20 Running Netflix on Firefox Browser netflix_windows_20m_01 506.4
Video Streaming Prime Video Linux 20 Running Prime Video on Firefox Browser prime_linux_20m_01 767.3
Video Streaming Prime Video Linux 20 Running Prime Video on Firefox Browser prime_linux_20m_02 569.3
Video Streaming Prime Video Windows 20 Running Prime Video on Firefox Browser prime_windows_20m_01 512.3
Video Streaming Prime Video Windows 20 Running Prime Video on Firefox Browser prime_windows_20m_02 364.2
Gaming Roblox Windows 20 Doesn't run in VM roblox_windows_20m_01 127.5
Gaming Roblox Windows 20 Doesn't run in VM roblox_windows_20m_02 378.5
Gaming Roblox Windows 20 Doesn't run in VM roblox_windows_20m_03 458.9
Gaming Roblox Windows 30 Doesn't run in VM roblox_windows_30m_01 519.8
Gaming Roblox Windows 30 Doesn't run in VM roblox_windows_30m_02 357.3
Gaming Roblox Windows 35 Doesn't run in VM roblox_windows_35m_01 880.4
Audio Streaming Spotify Linux 30 Running Spotify app on Ubuntu-Linux spotify_linux_30m_01 98.2
Audio Streaming Spotify Linux 30 Running Spotify app on Ubuntu-Linux spotify_linux_30m_02 112.2
Audio Streaming Spotify Linux 30 Running Spotify app on Ubuntu-Linux spotify_linux_30m_03 175.5
Audio Streaming Spotify Windows 30 Running Spotify app on Windows spotify_windows_30m_01 50.7
Audio Streaming Spotify Windows 30 Doesn't run in VM spotify_windows_30m_02 63.2
Audio Streaming Spotify Windows 33 Running Spotify app on Windows spotify_windows_33m_01 70.9
Video Conferencing Teams Linux 20 Running Teams on Firefox Browser teams_linux_20m_01 134.6
Video Conferencing Teams Linux 20 Running Teams on Firefox Browser teams_linux_20m_02 343.3
Video Conferencing Teams Linux 20 Running Teams on Firefox Browser teams_linux_20m_03 376.6
Video Conferencing Teams Windows 20 Running Teams on Firefox Browser teams_windows_20m_01 634.1
Video Conferencing Teams Windows 20 Running Teams on Firefox Browser teams_windows_20m_02 517.8
Video Conferencing Teams Windows 20 Running Teams on Firefox Browser teams_windows_20m_03 629.9
Web Browsing Web Linux 2 OWIN6G website on Firefox Browser web_linux_2m_owin6g 1.2
Web Browsing Web Linux 2 Wikipedia website on Firefox Browser web_linux_2m_wikipedia 19.7
Web Browsing Web Linux 3 OWIN6G website on Firefox Browser web_linux_3m_owin6g 4.5
Web Browsing Web Linux 3 Wikipedia website on Firefox Browser web_linux_3m_wikipedia 23.5
Web Browsing Web Linux 5 Amazon website on Chrome Browser web_linux_5m_amazon 262.9
Web Browsing Web Linux 5 BBC website on Firefox Browser web_linux_5m_bbc 55.7
Web Browsing Web Linux 5 Google website on Firefox Browser web_linux_5m_google 22.6
Web Browsing Web Linux 5 Linkedin website on Firefox Browser web_linux_5m_linkedin 39.8
Web Browsing Web Windows 3 OWIN6G website on Firefox Browser web_windows_3m_owin6g 32.6
Web Browsing Web Windows 3 Wikipedia website on Firefox Browser web_windows_3m_wikipedia 94.9
Web Browsing Web Windows 5 Amazon website on Chrome Browser web_windows_5m_amazon 104.0
Web Browsing Web Windows 5 BBC website on Firefox Browser web_windows_5m_bbc 23.1
Web Browsing Web Windows 5 Google website on Firefox Browser web_windows_5m_google 31.5
Web Browsing Web Windows 5 Linkedin website on Firefox Browser web_windows_5m_linkedin 104.1
Web Streaming Youtube Linux 20 One Video Streaming, 4K youtube_linux_20m_01 1,145.6
Web Streaming Youtube Linux 20 One Video Streaming, FullHD youtube_linux_20m_02 389.4
Web Streaming Youtube Linux 20 One Video Streaming, FullHD youtube_linux_20m_03 2,007.1
Web Streaming Youtube Linux 20 One Video Streaming, 4K youtube_linux_20m_04 390.4
Web Streaming Youtube Linux 20 One Video Streaming, FullHD youtube_linux_20m_05 410.1
Web Streaming Youtube Linux 20 One Video Streaming, FullHD youtube_linux_20m_06 571.9
Web Streaming Youtube Linux 25 One Video Streaming, FullHD youtube_linux_25m_04 617.0
Web Streaming Youtube Linux 30 One Video Streaming, FullHD youtube_linux_30m_01 422.9
Web Streaming Youtube Linux 30 One Video Streaming, FullHD youtube_linux_30m_02 494.1
Web Streaming Youtube Linux 30 One Video Streaming, 4K youtube_linux_30m_03 871.0
Web Streaming Youtube Windows 20 One Video Streaming, 4K youtube_windows_20m_01 4,243.4
Web Streaming Youtube Windows 25 One Video Streaming, FullHD youtube_windows_25m_01 284.7
Web Streaming Youtube Windows 25 One Video Streaming, FullHD youtube_windows_25m_02 291.9
Total
18.5
24,260.3
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Internet Protocol Television (IPTV) Market Size 2024-2028
The internet protocol television (IPTV) market size is valued to increase USD 128.41 billion, at a CAGR of 23.31% from 2023 to 2028. Rise in mobile and smart device adoption will drive the internet protocol television (IPTV) market.
Major Market Trends & Insights
North America dominated the market and accounted for a 30% growth during the forecast period.
By Component - Hardware segment was valued at USD 14.84 billion in 2022
By Type - Wired segment accounted for the largest market revenue share in 2022
Market Size & Forecast
Market Opportunities: USD 520.06 billion
Market Future Opportunities: USD 128.41 billion
CAGR : 23.31%
North America: Largest market in 2022
Market Summary
The market represents a dynamic and ever-evolving sector in the global media and entertainment industry. Core technologies, including adaptive streaming and ultra-high definition (UHD), continue to shape the market's landscape, enabling high-quality, real-time streaming of live TV and video-on-demand (VOD) content. IPTV applications, such as education, healthcare, and hospitality, are witnessing significant growth, particularly in regions with robust broadband infrastructure. As of 2021, mobile and smart device adoption has surged, with approximately 62% of consumers worldwide using mobile devices to access IPTV services. This trend is fueled by the convenience and flexibility offered by mobile IPTV, which allows users to stream content anytime, anywhere.
However, challenges persist, including piracy and illegal streaming, which accounted for an estimated 11% of global internet traffic in 2020. Regulations, such as the European Union's Audiovisual Media Services Directive and the United States' Children's Television Act, play a crucial role in shaping the IPTV market. Regional markets, including North America and Europe, dominate the market share due to their advanced broadband infrastructure and early adoption of IPTV services. The ongoing evolution of the IPTV market presents numerous opportunities for innovation and growth, with the potential to disrupt traditional broadcasting models and redefine the way we consume media.
What will be the Size of the Internet Protocol Television (IPTV) Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Internet Protocol Television (IPTV) Market Segmented and what are the key trends of market segmentation?
The internet protocol television (IPTV) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Component
Hardware
Software
Services
Type
Wired
Wireless
Geography
North America
US
Europe
UK
APAC
China
Japan
South Korea
Rest of World (ROW)
By Component Insights
The hardware segment is estimated to witness significant growth during the forecast period.
In the dynamic and evolving IPTV market, high-definition video delivery continues to dominate, with video quality assessment playing a crucial role in ensuring optimal user experience. Video transcoding workflows streamline the process of converting video formats, enabling seamless broadcast infrastructure for remote control applications. Video on demand and digital rights management are integral components, with content delivery networks and bandwidth optimization techniques ensuring efficient delivery and minimizing video buffering. HLS video delivery and MPEG-DASH adaptive bitrate streaming offer enhanced user experience by adapting to varying network conditions. Personalized content delivery, CDN content delivery, and secure video transmission are essential for catering to diverse consumer preferences and ensuring data security.
Video compression codecs, user interface design, and IPTV set-top boxes facilitate easy access to a wide range of content. Latency reduction techniques and ad insertion technology cater to the demands of interactive TV services and applications, while network congestion control and IP multicast streaming maintain service quality. Subscriber management systems and service level agreements ensure customer satisfaction and revenue growth. According to recent reports, the hardware segment held a significant market share of approximately 45% in 2023. This segment comprises essential components like set-top boxes, routers, switches, and other networking equipment. Specifically, set-top boxes accounted for around 30% of the market share, with an estimated 55% year-on-year growth in demand
Request Free Sample
The Hardware segment was valued at USD 14.84 billion in 2018 and showed a gradual increase during the forecast period.
Routers and switches, mean
Open source data platform and multidisciplinary online repository where research groups and different organizations store and make public their datasets, managed by Scayle. Collection of public datasets are available through open.scayle.es and can be reused. NetFlow is network protocol developed by Cisco for collection and monitoring of network traffic flow data generated. Netflow datasets have been used to train machine learning models.
To generate a representative dataset of real-world traffic in ISCX we defined a set of tasks, assuring that our dataset is rich enough in diversity and quantity. We created accounts for users Alice and Bob in order to use services like Skype, Facebook, etc. Below we provide the complete list of different types of traffic and applications considered in our dataset for each traffic type (VoIP, P2P, etc.)
We captured a regular session and a session over VPN, therefore we have a total of 14 traffic categories: VOIP, VPN-VOIP, P2P, VPN-P2P, etc. We also give a detailed description of the different types of traffic generated:
Browsing: Under this label we have HTTPS traffic generated by users while browsing or performing any task that includes the use of a browser. For instance, when we captured voice-calls using hangouts, even though browsing is not the main activity, we captured several browsing flows.
Email: The traffic samples generated using a Thunderbird client, and Alice and Bob Gmail accounts. The clients were configured to deliver mail through SMTP/S, and receive it using POP3/SSL in one client and IMAP/SSL in the other.
Chat: The chat label identifies instant-messaging applications. Under this label we have Facebook and Hangouts via web browsers, Skype, and IAM and ICQ using an application called pidgin [14].
Streaming: The streaming label identifies multimedia applications that require a continuous and steady stream of data. We captured traffic from Youtube (HTML5 and flash versions) and Vimeo services using Chrome and Firefox.
File Transfer: This label identifies traffic applications whose main purpose is to send or receive files and documents. For our dataset we captured Skype file transfers, FTP over SSH (SFTP) and FTP over SSL (FTPS) traffic sessions.
VoIP: The Voice over IP label groups all traffic generated by voice applications. Within this label we captured voice calls using Facebook, Hangouts and Skype.
TraP2P: This label is used to identify file-sharing protocols like Bittorrent. To generate this traffic we downloaded different .torrent files from a public a repository and captured traffic sessions using the uTorrent and Transmission applications.
The traffic was captured using Wireshark and tcpdump, generating a total amount of 28GB of data. For the VPN, we used an external VPN service provider and connected to it using OpenVPN (UDP mode). To generate SFTP and FTPS traffic we also used an external service provider and Filezilla as a client.
To facilitate the labeling process, when capturing the traffic all unnecessary services and applications were closed. (The only application executed was the objective of the capture, e.g., Skype voice-call, SFTP file transfer, etc.) We used a filter to capture only the packets with source or destination IP, the address of the local client (Alice or Bob).
The full research paper outlining the details of the dataset and its underlying principles:
Gerard Drapper Gil, Arash Habibi Lashkari, Mohammad Mamun, Ali A. Ghorbani, "Characterization of Encrypted and VPN Traffic Using Time-Related Features", In Proceedings of the 2nd International Conference on Information Systems Security and Privacy(ICISSP 2016) , pages 407-414, Rome, Italy.
ISCXFlowMeter has been written in Java for reading the pcap files and create the csv file based on selected features. The UNB ISCX Network Traffic (VPN-nonVPN) dataset consists of labeled network traffic, including full packet in pcap format and csv (flows generated by ISCXFlowMeter) also are publicly available for researchers.
For more information contact cic@unb.ca.
The UNB ISCX Network Traffic Dataset content
Traffic: Content
Web Browsing: Firefox and Chrome
Email: SMPTS, POP3S and IMAPS
Chat: ICQ, AIM, Skype, Facebook and Hangouts
Streaming: Vimeo and Youtube
File Transfer: Skype, FTPS and SFTP using Filezilla and an external service
VoIP: Facebook, Skype and Hangouts voice calls (1h duration)
P2P: uTorrent and Transmission (Bittorrent)
; cic@unb.ca.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.