Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Traffic Flow Analysis: The dataset can be used in machine learning models to analyze traffic flow in cities. It can identify the type of vehicles on the city roads at different times of the day, helping in planning and traffic management.
Vehicle Class Based Toll Collection: Toll booths can use this model to automatically classify and charge vehicles based on their type, enabling a more efficient and automated system.
Parking Management System: Parking lot owners can use this model to easily classify vehicles as they enter for better space management. Knowing the vehicle type can help assign it to the most suitable parking spot.
Traffic Rule Enforcement: The dataset can be used to create a computer vision model to automatically detect any traffic violations like wrong lane driving by different vehicle types, and notify law enforcement agencies.
Smart Ambulance Tracking: The system can help in identifying and tracking ambulances and other emergency vehicles, enabling traffic management systems to provide priority routing during emergencies.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global real-time traffic data market size is anticipated to reach USD 15.3 billion by 2032 from an estimated USD 6.5 billion in 2023, exhibiting a robust CAGR of 10.1% over the forecast period. This substantial growth is driven by the increasing need for efficient traffic management systems and the rising adoption of smart city initiatives worldwide. Governments and commercial entities are investing heavily in advanced technologies to optimize traffic flow and enhance urban mobility, thus fostering market expansion.
The surge in urbanization and the consequent rise in vehicle ownership have led to severe traffic congestion issues in many metropolitan areas. This has necessitated the implementation of real-time traffic data systems that can provide accurate and timely information to manage traffic effectively. With the integration of sophisticated technologies such as IoT, AI, and big data analytics, these systems are becoming more efficient, thereby driving market growth. Furthermore, the growing emphasis on reducing carbon emissions and enhancing road safety is also propelling the adoption of real-time traffic data solutions.
Technological advancements are playing a pivotal role in shaping the real-time traffic data market. Innovations in sensor technology, the proliferation of GPS devices, and the widespread use of mobile data are providing rich sources of real-time traffic information. The ability to integrate data from multiple sources and deliver actionable insights is significantly enhancing traffic management capabilities. Additionally, the development of cloud-based solutions is enabling scalable and cost-effective deployment of traffic data systems, further contributing to market growth.
Another critical growth factor is the increasing investment in smart city projects. Governments across the globe are prioritizing the development of smart transportation infrastructure to improve urban mobility and reduce traffic-related issues. Real-time traffic data systems are integral to these initiatives, providing essential data for optimizing traffic flow, enabling route optimization, and enhancing public transport efficiency. The involvement of private sector players in these projects is also fueling market growth by introducing innovative solutions and fostering public-private partnerships.
The exponential rise in Mobile Data Traffic is another significant factor influencing the real-time traffic data market. As more people rely on smartphones and mobile applications for navigation and traffic updates, the demand for real-time data has surged. Mobile data provides a wealth of information about traffic patterns and congestion levels, enabling more accurate and timely traffic management. The integration of mobile data with other data sources, such as GPS and sensor data, enhances the overall effectiveness of traffic data systems. This trend is particularly evident in urban areas where mobile devices are ubiquitous, and the need for efficient traffic management is critical. The ability to harness mobile data for traffic insights is driving innovation and growth in the market, as companies develop new solutions to leverage this valuable resource.
Regionally, North America and Europe are leading the market due to their early adoption of advanced traffic management technologies and significant investments in smart city projects. However, the Asia Pacific region is expected to witness the highest growth rate over the forecast period, driven by rapid urbanization, increasing vehicle ownership, and growing government initiatives to develop smart transportation infrastructure. Emerging economies in Latin America and the Middle East & Africa are also showing promising growth potential, fueled by ongoing infrastructure development and increasing awareness of the benefits of real-time traffic data solutions.
The real-time traffic data market by component is segmented into software, hardware, and services. Each component plays a crucial role in the overall functionality and effectiveness of traffic data systems. The software segment includes traffic management software, route optimization software, and other analytical tools that help process and analyze traffic data. The hardware segment comprises sensors, GPS devices, and other data collection tools. The services segment includes installation, maintenance, and consulting services that support the deployment and operation of traffic data systems
Traffic Analysis Zones (TAZ) for the COG/TPB Modeled Region from Metropolitan Washington Council of Governments. The TAZ dataset is used to join several types of zone-based transportation modeling data. For more information, visit https://plandc.dc.gov/page/traffic-analysis-zone.
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Network Traffic Analysis Market is Segmented by Deployment (On-Premise, Cloud-Based, and Hybrid), Component (Solutions and Services), Organization Size (Large Enterprises and Small and Medium Enterprises), End-User Industry (BFSI, IT and Telecom, and More), and Geography. The Market Sizes and Forecasts are Provided in Value (in USD Million) for all the Above Segments.
The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name | Detection problem | Citation of original raw dataset |
botnet_binary.csv | Binary detection of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
botnet_multiclass.csv | Multi-class classification of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
cryptomining_design.csv | Binary detection of cryptomining; the design part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
cryptomining_evaluation.csv | Binary detection of cryptomining; the evaluation part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
dns_malware.csv | Binary detection of malware DNS | Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021. |
doh_cic.csv | Binary detection of DoH |
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020 |
doh_real_world.csv | Binary detection of DoH | Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022 |
dos.csv | Binary detection of DoS | Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019. |
edge_iiot_binary.csv | Binary detection of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
edge_iiot_multiclass.csv | Multi-class classification of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
https_brute_force.csv | Binary detection of HTTPS Brute Force | Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020 |
ids_cic_binary.csv | Binary detection of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_cic_multiclass.csv | Multi-class classification of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_unsw_nb_15_binary.csv | Binary detection of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
ids_unsw_nb_15_multiclass.csv | Multi-class classification of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
iot_23.csv | Binary detection of IoT malware | Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23 |
ton_iot_binary.csv | Binary detection of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
ton_iot_multiclass.csv | Multi-class classification of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
tor_binary.csv | Binary detection of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
tor_multiclass.csv | Multi-class classification of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
vpn_iscx_binary.csv | Binary detection of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_iscx_multiclass.csv | Multi-class classification of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_vnat_binary.csv | Binary detection of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
vpn_vnat_multiclass.csv | Multi-class classification of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets.
Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic.
Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset.
The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.
Urban SDK is a GIS data management platform and global provider of mobility, urban characteristics, and alt datasets. Urban SDK Traffic data provides traffic volume, average speed, average travel time and congestion for logistics, transportation planning, traffic monitoring, routing and urban planning. Traffic data is generated from cars, trucks and mobile devices for major road networks in US and Canada.
"With the old data I used, it took me 3-4 weeks to create a presentation. I will be able to do 3-4x the work with your Urban SDK traffic data."
Traffic Volume, Speed and Congestion Data Type Profile:
Industry Solutions include:
Use cases:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Traffic Flow Analysis: This model could be used in smart cities to monitor and analyze traffic patterns across different times of the day, week or year. It can provide detailed insights into the types of vehicles and amount of pedestrians using specific roads or intersections, thereby helping in urban planning strategies.
Traffic Management Systems: The model could be incorporated into traffic management systems to dynamically control traffic lights depending on the type and volume of traffic. For instance, if a greater influx of cars and trucks is detected, traffic light timings could be adjusted to improve flow and decrease congestion.
Parking Lot Management: Retail centers, airports, or other facilities with large parking areas could use this technology to count the vehicles entering and exiting their premises, enabling efficient parking management and planning.
Transport Research: Research institutions could use the model to carry out comprehensive studies on transportation patterns, commuting trends, and the usage of different types of vehicles in different regions.
Safety Monitoring: The system could be used to detect anomalous events in traffic such as an increased number of pedestrians on the road or unusual vehicle patterns that could potentially lead to accidents. This could assist in devising safety measures and regulations.
This layer contains the geographical boundaries of the Metropolitan Washington Council of Government's Traffic Analysis Zones (TAZ) of Loudoun County, Virginia. TAZs are designed to be relatively homogeneous units with respect to population, economic, and transportation characteristics. These TAZ boundaries were delineated by Loudoun County Government and adopted by the Metropolitan Washington Council of Governments.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global traffic modeling and simulation software market is projected to reach a value of USD XXX million by 2033, registering a CAGR of XX% during the forecast period (2025-2033). Urbanization, increasing traffic congestion, and the need for efficient transportation systems are driving the demand for traffic modeling and simulation software. The software helps analyze traffic patterns, identify bottlenecks, and optimize infrastructure design to improve traffic flow and reduce congestion. Key market drivers include the increasing adoption of smart city initiatives, the growing focus on sustainable transportation, and the need for real-time traffic management. Cloud-based traffic modeling and simulation software is gaining traction due to its scalability, cost-effectiveness, and ease of access. Major players in the market include AnyLogic, PTV Group, AECOM, ETAP, Systra, Dassault Systèmes, Mosimtec, VI-grade, Berkeley Simulation, and Gamma Technologies. The market is fragmented, with regional players holding significant market share in their respective regions. The global traffic modeling and simulation software market surpassed USD 1.2 billion in 2021 and is projected to reach approximately USD 2.3 billion by 2029, exhibiting a CAGR of 7.9% during the forecast period (2022-2029).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
🇬🇧 English:
This synthetic dataset provides location-based traffic congestion levels on an hourly basis over the last 30 days. It can be used to train time series models like LSTM and XGBoost to forecast traffic intensity.
Use this dataset to:
Features:
🇹🇷 Türkçe:
Bu sentetik veri seti, son 30 güne ait saatlik trafik yoğunluğu bilgilerini lokasyon bazlı olarak sunar. Trafik yoğunluğunu tahmin etmeye yönelik zaman serisi modellerinin eğitimi için uygundur.
Bu veri seti ile:
Özellikler:
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The real-time traffic data market, currently valued at $36.9 billion in 2025, is experiencing robust growth, projected to expand at a Compound Annual Growth Rate (CAGR) of 12.5% from 2025 to 2033. This significant expansion is fueled by several key factors. The increasing adoption of connected vehicles and the rise of smart city initiatives are driving demand for accurate and timely traffic information. Furthermore, the logistics and transportation sectors heavily rely on real-time data for efficient route optimization, delivery scheduling, and fleet management, contributing substantially to market growth. Government agencies are also significant consumers, leveraging this data for urban planning, traffic management, and emergency response systems. The market is segmented by application (Government, Logistics, Infrastructure Construction, Automobile, and Other) and data type (Traffic Data, Mobility Data, Car Traffic Data), with the Government and Logistics segments exhibiting particularly strong growth potential due to their increasing reliance on data-driven decision-making. Technological advancements such as improved sensor technologies and the development of sophisticated analytical tools are further enhancing the capabilities and accuracy of real-time traffic data solutions. Competitive dynamics within the real-time traffic data market are characterized by a mix of established players and emerging technology companies. Key players like TomTom, HERE Technologies, and INRIX are leveraging their existing mapping and navigation expertise to provide comprehensive real-time traffic data solutions. However, newer companies are entering the market with innovative data aggregation and analysis techniques, leading to increased competition and potentially lower prices. The geographic distribution of market share is expected to be dominated by North America and Europe initially, given the higher adoption rates of smart city technologies and connected vehicle infrastructure in these regions. However, rapid infrastructure development and increasing urbanization in Asia-Pacific are projected to drive substantial market growth in this region over the forecast period. The market's continued growth hinges on continued investment in smart city infrastructure, the expanding adoption of connected car technology, and the continuous development of more sophisticated data analytics.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global traffic counter market is estimated to reach a value of XXXX million by 2033, exhibiting a CAGR of XX% during the forecast period (2025-2033). The market growth is primarily driven by the increasing demand for efficient traffic management systems, rising urbanization, and the need for data-driven traffic analysis. The adoption of advanced technologies, such as radar monitoring and video recognition, is further fueling the market expansion. The market is segmented based on application into road, parking lot, and others. The road segment holds the largest market share due to the extensive use of traffic counters to monitor traffic flow and congestion on highways and roads. The parking lot segment is also witnessing significant growth owing to the rising need for efficient parking management systems in commercial and residential areas. In terms of region, North America is expected to dominate the market, followed by Europe and Asia Pacific. Key drivers in the North American market include the presence of advanced traffic management systems and the growing adoption of smart city initiatives. Europe is also a significant market for traffic counters, with a high demand for traffic monitoring and analysis solutions in urban areas.
This feature layer displays the Traffic Analysis Zones layer for the City of Tallahassee and Leon County, Florida. A TAZ analysis is conducted every 5 years as part of the Capital Regional Transportation Planning Agency’s (CRTPA) Regional Mobility Plan. This TAZ analysis is part of CRTPA's Connections 2045 Regional Mobility Plan which can be found here Link.Traffic Analysis Zone: A traffic analysis zone (TAZ) is a special area delineated by state and/or local transportation officials for tabulating traffic-related data–especially journey-to-work and place-of-work statistics. A TAZ usually consists of one or more census blocks, block groups, or census tracts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by a LoRaWAN sniffer and contains packets, which are thoroughly analyzed in the paper Exploring LoRaWAN Traffic: In-Depth Analysis of IoT Network Communications (not yet published). Data from the LoRaWAN sniffer was collected in four cities: Liege (Belgium), Graz (Austria), Vienna (Austria), and Brno (Czechia).
Gateway ID: b827ebafac000001
Gateway ID: b827ebafac000002
Gateway ID: b827ebafac000003
To open the pcap
files, you need Wireshark with current support for LoRaTap and LoRaWAN protocols. This support will be available in the official 4.1.0 release. A working version for Windows is accessible in the automated build system.
The source data is available in the log.zip
file, which contains the complete dataset obtained by the sniffer. A set of conversion tools for log processing is available on Github. The converted logs, available in Wireshark format, are stored in pcap.zip
. For the LoRaWAN decoder, you can use the attached root and session keys. The processed outputs are stored in csv.zip
, and graphical statistics are available in png.zip
.
This data represents a unique, geographically identifiable selection from the full log, cleaned of any errors. The records from Brno include communication between the gateway and a node with known keys.
Test file :: 00_Test
Brno, Czech Republic :: 01_Brno
70b3d5cee0000042
d494d49a7b4053302bdcf96f1defa65a
00d85395
c417540b8b2afad8930c82fcf7ea54bb
421fea9bedd2cc497f63303edf5adf8e
Liege, Belgium :: 02_Liege
:: evaluated in the paper
Brno, Czech Republic :: 03_Brno_join
70b3d5cee0000042
d494d49a7b4053302bdcf96f1defa65a
01e65ddc
e2898779a03de59e2317b149abf00238
59ca1ac91922887093bc7b236bd1b07f
Graz, Austria :: 04_Graz
:: evaluated in the paper
Vienna, Austria :: 05_Wien
:: evaluated in the paper
Brno, Czech Republic :: 07_Brno
:: evaluated in the paper
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Traffic Analysis Aerial View is a dataset for object detection tasks - it contains Cars Trucks Buses Cycles annotations for 2,757 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Traffic Analysis Zones
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Traffic Flow Analysis: The dataset can be used in machine learning models to analyze traffic flow in cities. It can identify the type of vehicles on the city roads at different times of the day, helping in planning and traffic management.
Vehicle Class Based Toll Collection: Toll booths can use this model to automatically classify and charge vehicles based on their type, enabling a more efficient and automated system.
Parking Management System: Parking lot owners can use this model to easily classify vehicles as they enter for better space management. Knowing the vehicle type can help assign it to the most suitable parking spot.
Traffic Rule Enforcement: The dataset can be used to create a computer vision model to automatically detect any traffic violations like wrong lane driving by different vehicle types, and notify law enforcement agencies.
Smart Ambulance Tracking: The system can help in identifying and tracking ambulances and other emergency vehicles, enabling traffic management systems to provide priority routing during emergencies.