Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
NetFlow flows have been captured with sampling 250 at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.
The version of NetFlow used to build the datasets is 5.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The network data schema is in the Netflow V9 format. Given two files 'train_net.csv' and 'test_net.csv', train_net.csv explains when the particular ALERT will happen. There are 4 classes present in the dataset, named following: 'None', 'Port Scanning', 'Denial of Service', 'Malware'.
SIMARGL Project – Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware, with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No. 833042.
Maria-Elena Mihailescu, Darius Mihai, Mihai Carabas, Mikolaj Komisarek, Marek Pawlicki, Witold Holubowicz, Rafal Kozik: The Proposition and Evaluation of the RoEduNet-SIMARGL2021 Network Intrusion Detection Dataset. Sensors 21(13): 4319 (2021)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: exchange netflow - example
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.
NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
Datasets
The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).
The datasets contain both benign and malicious traffic. All collected datasets are balanced.
The version of NetFlow used to build the datasets is 5.
| Dataset | Aim | Samples | Benign-malicious traffic ratio |
|---|---|---|---|
| D1 | Training | 400,003 | 50% |
| D2 | Test | 57,239 | 50% |
Infrastructure and implementation
Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.
DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)
Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).
The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.
The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.
| Parameters | Description |
|---|---|
| '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' | Enumerate users, password hashes, privileges, roles, databases, tables and columns |
| --level=5 | Increase the probability of a false positive identification |
| --risk=3 | Increase the probability of extracting data |
| --random-agent | Select the User-Agent randomly |
| --batch | Never ask for user input, use the default behavior |
| --answers="follow=Y" | Predefined answers to yes |
Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).
The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.
However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.
To run the MySQL server we ran MariaDB version 10.4.12.
Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
Facebook
Twitter
According to our latest research, the global NetFlow Analyzer market size in 2024 stands at USD 1.42 billion, reflecting robust demand across diverse industries. Driven by the increasing need for comprehensive network visibility and security, the market is set to expand at a Compound Annual Growth Rate (CAGR) of 11.3% from 2025 to 2033. By the end of the forecast period, the NetFlow Analyzer market is expected to reach USD 3.72 billion. This remarkable growth trajectory is attributed to the surge in network complexity, escalating cyber threats, and the widespread adoption of cloud-based infrastructure, all of which are compelling organizations to invest in advanced network traffic analysis solutions.
A primary growth driver for the NetFlow Analyzer market is the escalating sophistication of cyber threats and the corresponding need for enhanced network security and visibility. Enterprises across sectors are increasingly recognizing the importance of real-time network monitoring to detect anomalies, prevent data breaches, and ensure regulatory compliance. With the proliferation of IoT devices, cloud computing, and mobile endpoints, network architectures have become more intricate, making traditional monitoring tools insufficient. NetFlow Analyzer solutions, leveraging flow-based monitoring, offer granular insights into network traffic, enabling organizations to identify suspicious patterns, optimize bandwidth usage, and maintain robust security postures. This growing awareness and the critical need for proactive threat detection are fueling the rapid adoption of NetFlow Analyzer solutions globally.
Another significant factor propelling market growth is the digital transformation wave sweeping across industries. Enterprises are increasingly migrating workloads to the cloud, adopting hybrid environments, and leveraging virtualized networks to enhance agility and scalability. These trends, while beneficial, introduce new challenges in network management and performance optimization. NetFlow Analyzer tools, with their ability to provide deep visibility into network flows, traffic patterns, and application performance, have become indispensable for IT teams striving to maintain high availability and user experience. The demand for advanced analytics, real-time reporting, and automated alerting capabilities is further driving innovation in the NetFlow Analyzer market, as vendors continuously enhance their offerings to meet evolving enterprise requirements.
The market is also benefiting from the growing emphasis on regulatory compliance and data privacy. Regulations such as GDPR, HIPAA, and PCI DSS mandate stringent controls over network traffic, data access, and incident response. NetFlow Analyzer solutions play a pivotal role in helping organizations monitor network activities, generate audit trails, and demonstrate compliance with industry standards. The ability to correlate flow data with security events, generate compliance reports, and support forensic investigations is prompting organizations in highly regulated sectors, such as BFSI, healthcare, and government, to invest in robust NetFlow Analyzer platforms. This compliance-driven adoption is expected to sustain market growth over the forecast period.
From a regional perspective, North America continues to dominate the NetFlow Analyzer market, accounting for the largest revenue share in 2024. The region’s leadership is underpinned by the presence of major technology providers, early adoption of advanced network management tools, and a high incidence of cyberattacks. Europe and Asia Pacific are also witnessing substantial growth, fueled by increasing investments in IT infrastructure, expanding digital economies, and the rising need for network visibility in emerging markets. Latin America and the Middle East & Africa, while smaller in market size, are experiencing steady adoption as organizations modernize their network operations and prioritize security. The global nature of digital transformation and the universal imperative for network security are ensuring that demand for NetFlow Analyzer solutions remains strong across all regions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
| Field Name | Description |
|---|---|
| FLOW_ID | Unique identificator of flow |
| IPV4_SRC_ADDR | IPv4 source address |
| IPV4_DST_ADDR | IPv4 destination address |
| IN_PKTS | Number of incoming packets |
| IN_BYTES | Number of incoming bytes |
| OUT_PKTS | Number of outgoing packets |
| OUT_BYTES | Number of outgoing bytes |
| FIRST_SWITCHED | Time of first packet in the flow |
| LAST_SWITCHED | Time of last packet in the flow |
| L4_SRC_PORT | Layer 4 source port |
| L4_DST_PORT | Layer 4 destination port |
| TCP_FLAGS | TCP flags |
| PROTOCOL | Protocol |
| PROTOCOL_MAP | Protocol map |
| TOTAL_FLOWS_EXP | Total flows experienced |
| L7_PROTO | Layer 7 protocol |
| L7_PROTO_NAME | Layer 7 protocol name |
| ANOMALY_CATEGORY | Name of classification flow |
| ANOMALY | Binary classification flow |
This work is co-funded under the APPRAISE Project – fAcilitating Public & Private secuRity operAtors to mitigate terrorIsm Scenarios against soft targEts, with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No. 101021981.
Facebook
Twitter
According to our latest research, the global NetFlow-to-Kafka Bridge market size in 2024 stands at USD 412 million, registering a robust growth trajectory. The market is projected to expand at a CAGR of 18.4% from 2025 to 2033, reaching a forecasted value of approximately USD 1,964 million by 2033. This remarkable growth is primarily driven by the increasing adoption of real-time data streaming solutions, the proliferation of networked devices, and the heightened demand for actionable network intelligence across diverse industries.
One of the key growth factors propelling the NetFlow-to-Kafka Bridge market is the surge in enterprise digital transformation initiatives. Organizations across industries are migrating from legacy systems to modern data architectures that emphasize real-time visibility and actionable insights. This transition is fueling demand for solutions that can seamlessly bridge network flow data with scalable streaming platforms like Apache Kafka. As businesses strive for operational efficiency and enhanced security, the ability to ingest, process, and analyze high-volume network telemetry data in real time has become indispensable. The NetFlow-to-Kafka Bridge plays a pivotal role in enabling this capability, allowing enterprises to proactively monitor, detect anomalies, and ensure optimal network performance.
Another significant driver is the exponential growth in network traffic, spurred by the adoption of cloud computing, IoT devices, and remote work models. The complexity of modern network environments has made traditional monitoring and analytics tools insufficient for handling the sheer volume and velocity of data. NetFlow-to-Kafka Bridge solutions address this challenge by providing scalable, low-latency data pipelines that transport network flow records directly into Kafka clusters for downstream processing and analytics. This capability is especially crucial for large-scale enterprises and service providers who require high granularity and flexibility in network monitoring and security analytics. As a result, the demand for these bridge solutions is expected to rise steadily, particularly among organizations with complex, distributed IT infrastructures.
Furthermore, regulatory compliance and the increasing sophistication of cyber threats are catalyzing the adoption of advanced network analytics platforms. Governments and industries such as BFSI, healthcare, and telecommunications are under mounting pressure to safeguard sensitive data and ensure uninterrupted service delivery. The NetFlow-to-Kafka Bridge serves as a critical enabler for real-time security analytics, anomaly detection, and forensic investigations by facilitating the seamless integration of network telemetry with big data and security information event management (SIEM) systems. This trend is expected to intensify as regulatory frameworks evolve and cyber risks become more pervasive, further boosting market growth.
From a regional perspective, North America currently dominates the NetFlow-to-Kafka Bridge market, accounting for the largest share in 2024, driven by the early adoption of advanced IT infrastructure and a strong focus on cybersecurity. However, the Asia Pacific region is poised for the fastest growth during the forecast period, fueled by rapid digitalization, expanding enterprise IT investments, and increasing awareness of network security. Europe also represents a significant market, characterized by stringent data protection regulations and a mature technology landscape. Latin America and the Middle East & Africa are emerging markets, witnessing gradual uptake of NetFlow-to-Kafka Bridge solutions as digital transformation initiatives gain momentum. Overall, the global market outlook remains highly favorable, with strong growth anticipated across all major regions.
The NetFlow-to-Kafka Bridge market is segmented by component into software, hardware, and services, each playing a distinct and vit
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
NetFlow flows have been captured with sampling 250 at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.
The version of NetFlow used to build the datasets is 5.