CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Traffic Flow Analysis: The dataset can be used in machine learning models to analyze traffic flow in cities. It can identify the type of vehicles on the city roads at different times of the day, helping in planning and traffic management.
Vehicle Class Based Toll Collection: Toll booths can use this model to automatically classify and charge vehicles based on their type, enabling a more efficient and automated system.
Parking Management System: Parking lot owners can use this model to easily classify vehicles as they enter for better space management. Knowing the vehicle type can help assign it to the most suitable parking spot.
Traffic Rule Enforcement: The dataset can be used to create a computer vision model to automatically detect any traffic violations like wrong lane driving by different vehicle types, and notify law enforcement agencies.
Smart Ambulance Tracking: The system can help in identifying and tracking ambulances and other emergency vehicles, enabling traffic management systems to provide priority routing during emergencies.
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name | Detection problem | Citation of original raw dataset |
botnet_binary.csv | Binary detection of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
botnet_multiclass.csv | Multi-class classification of botnet | S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014. |
cryptomining_design.csv | Binary detection of cryptomining; the design part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
cryptomining_evaluation.csv | Binary detection of cryptomining; the evaluation part | Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022 |
dns_malware.csv | Binary detection of malware DNS | Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021. |
doh_cic.csv | Binary detection of DoH |
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020 |
doh_real_world.csv | Binary detection of DoH | Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022 |
dos.csv | Binary detection of DoS | Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019. |
edge_iiot_binary.csv | Binary detection of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
edge_iiot_multiclass.csv | Multi-class classification of IoT malware | Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022. |
https_brute_force.csv | Binary detection of HTTPS Brute Force | Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020 |
ids_cic_binary.csv | Binary detection of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_cic_multiclass.csv | Multi-class classification of intrusion in IDS | Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018. |
ids_unsw_nb_15_binary.csv | Binary detection of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
ids_unsw_nb_15_multiclass.csv | Multi-class classification of intrusion in IDS | Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015. |
iot_23.csv | Binary detection of IoT malware | Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23 |
ton_iot_binary.csv | Binary detection of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
ton_iot_multiclass.csv | Multi-class classification of IoT malware | Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021 |
tor_binary.csv | Binary detection of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
tor_multiclass.csv | Multi-class classification of TOR | Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017. |
vpn_iscx_binary.csv | Binary detection of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_iscx_multiclass.csv | Multi-class classification of VPN | Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016. |
vpn_vnat_binary.csv | Binary detection of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
vpn_vnat_multiclass.csv | Multi-class classification of VPN | Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets.
Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic.
Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset.
The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.
Public (anonymized) road traffic prediction datasets from Huawei Munich Research Center.
Datasets from a variety of traffic sensors (i.e. induction loops) for traffic prediction. The data is useful for forecasting traffic patterns and adjusting stop-light control parameters, i.e. cycle length, offset and split times.
The dataset contains recorded data from 6 crosses in the urban area for the last 56 days, in the form of flow timeseries, depicted the number of vehicles passing every 5 minutes for a whole day (i.e. 12 readings/h, 288 readings/day, 16128 readings / 56 days).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The main aim of this dataset is to enable detection of traffic congestion from surveillance cameras using one-stage object detectors. The dataset contains congested and uncongested traffic scenes with their respective labels. This dataset is collected from different surveillance cameras video footage. To prepare the dataset frames are extracted from video sources and resized to a dimension of 500 x 500 with .jpg image format. To Annotate, the image LabelImg software has used. The format of the label is .txt with the same name as the image. The dataset is mainly prepared for YOLO Models but it can be converted to other models format.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is a countrywide traffic congestion dataset that covers 49 states of the USA. The congestion events data were collected from February 2016 to September 2022, using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by various entities, including the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road networks. The dataset contains approximately 33 million congestion records. We also provide a sampled version of data that includes 2 million events for easier processing and handling for those who prefer to work with a smaller amount of data.
If you use this dataset, please kindly cite the following paper:
The US Traffic Congestion dataset can be used for numerous applications, such as traffic modeling, simulated routing, identifying traffic hotspot locations, and exploring intrinsic traffic patterns and how they evolve over time.
Please note that the dataset may be missing data for certain days, which could be due to network connectivity issues during data collection. The dataset will not be updated, and this version should be considered the latest.
This dataset is being distributed solely for research purposes under the Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By downloading the dataset, you agree to use it only for non-commercial, research, or academic applications. If you use this dataset, it is necessary to cite the paper mentioned above.
For any inquiries or assistance, please contact Sobhan Moosavi at sobhan.mehr84@gmail.com
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update NotesMar 16 2024, remove spaces in the file and folder names.Mar 31 2024, delete the underscore in the city names with a space (such as San Francisco) in the '02_TransCAD_results' folder to ensure correct data loading by TransCAD (software version: 9.0).Aug 31 2024, add the 'cityname_link_LinkFlows.csv' file in the '02_TransCAD_results' folder to match the link from input data and the link from TransCAD results (LinkFlows) with the same Link_ID.IntroductionThis is a unified and validated traffic dataset for 20 US cities. There are 3 folders for each city.01 Input datathe initial network data obtained from OpenStreetMap (OSM)the visualization of the OSM dataprocessed node / link / od data02 TransCAD results (software version: 9.0)cityname.dbd : geographical network database of the city supported by TransCAD (version 9.0)cityname_link.shp / cityname_node.shp : network data supported by GIS software, which can be imported into TransCAD manually. Then the corresponding '.dbd' file can be generated for TransCAD with a version lower than 9.0od.mtx : OD matrix supported by TransCADLinkFlows.bin / LinkFlows.csv : traffic assignment results by TransCADcityname_link_LinkFlows.csv: the input link attributes with the traffic assignment results by TransCADShortestPath.mtx / ue_travel_time.csv : the traval time (min) between OD pairs by TransCAD03 AequilibraE results (software version: 0.9.3)cityname.shp : shapefile network data of the city support by QGIS or other GIS softwareod_demand.aem : OD matrix supported by AequilibraEnetwork.csv : the network file used for traffic assignment in AequilibraEassignment_result.csv : traffic assignment results by AequilibraEPublicationXu, X., Zheng, Z., Hu, Z. et al. (2024). A unified dataset for the city-scale traffic assignment model in 20 U.S. cities. Sci Data 11, 325. https://doi.org/10.1038/s41597-024-03149-8Usage NotesIf you use this dataset in your research or any other work, please cite both the dataset and paper above.A brief introduction about how to use this dataset can be found in GitHub. More detailed illustration for compiling the traffic dataset on AequilibraE can be referred to GitHub code or Colab code.ContactIf you have any inquiries, please contact Xiaotong Xu (email: kid-a.xu@connect.polyu.hk).
New York City Department of Transportation (NYC DOT) uses Automated Traffic Recorders (ATR) to collect traffic sample volume counts at bridge crossings and roadways. These counts do not cover the entire year, and the number of days counted per location may vary from year to year. Also see Automated Traffic Volume Counts: https://data.cityofnewyork.us/Transportation/Automated-Traffic-Volume-Counts/7ym2-wayt
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DESCRIPTION OF THE RESEARCH AND DATA: This work presents the Madrid Traffic Dataset (MTD), a comprehensive resource for the analysis and modeling of traffic patterns in Madrid. The dataset integrates data from traffic sensors, weather observations, calendar information, road infrastructure, and geolocation data to support advanced studies of urban mobility and predictive modeling.
In addition to the core data sources, the dataset includes temporal sequences and a traffic adjacency matrix, enabling the application of time-series analysis and graph-based modeling approaches.
-COMPLETE DATASET: The complete version of the MTD includes data from 554 traffic sensors distributed across the Madrid region, covering a total of 30 months (from June 2022 to November 2024).
-SUBSET DATASET: A more compact version derived from the complete dataset, focused on a subset of 300 traffic sensors with 17 months of data (from June 2022 to October 2023). This subset is designed for researchers requiring a lighter dataset.
DATA ORGANIZATION: The dataset is organized in a main directory containing a subfolder identified by the configuration data hash. This subfolder includes all key components: datasets, temporal sequences, adjacency matrices, and configuration files. The structure ensures that all resources are clearly arranged to facilitate easy access and reproducibility for researchers.
For more details, see [Submitted to IEEE Internet of the Things Journal].
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Traffic Dataset - 500 Videos
Dataset comprises 500 videos of urban traffic captured by surveillance cameras, providing real-time traffic data enriched with bounding box annotations for vehicles and pedestrians. Designed for traffic monitoring and safety research, the dataset supports tasks like vehicle detection, traffic flow analysis, and accident prediction. By leveraging this dataset, researchers and engineers can advance real-time object detection, traffic surveillance systems… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/real-time-traffic-video-dataset.
Feature layer containing authoritative traffic count points for Sioux Falls, South Dakota.The traffic counts listed are 24-hour, weekday, two-directional counts. Traffic counts are normally collected during the summer months, but may be taken any season, as weather permits. The traffic counts are factored by the day of the week as well as by the month of the year to become an Average Annual Daily Total (AADT). Traffic volumes (i.e. count data) can fluctuate depending on the month, week, day of collection; the weather, type of road surface, nearby construction, etc. All of the historical data should be averaged to reflect the "normal" traffic count. More specific count data (time, date, hourly volume) can be obtained from the Sioux Falls Engineering Division at 367-8601.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network Address Translation (NAT)
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset contains 145063 time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2022-06-30. This is an extended version of the dataset that was used in the Kaggle Wikipedia Web Traffic forecasting competition. For consistency, the same Wikipedia pages that were used in the competition have been used in this dataset as well. The colons (:) in article names have been replaced by dashes (-) to make the .tsf file readable using our data loaders.
The data were downloaded from the Wikimedia REST API. According to the conditions of the API, this dataset is licensed under CC-BY-SA 3.0 and GFDL licenses.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for traffic surveillance anomaly detection, originally from the WSAL (Weakly-Supervised Anomaly Localization) repository. It consists of 500 short video clips totaling approximately 25 hours of footage. Each clip averages around 1,075 frames, and anomalies, when present, typically span around 80 frames.
Each video is labeled to indicate whether it contains an anomaly or not, enabling both supervised training and evaluation. You can use the labels to develop or compare different anomaly detection methods.
If you use this dataset for your research, please cite the following paper:
@article{wsal_tip21,
author = {Hui Lv and
Chuanwei Zhou and
Zhen Cui and
Chunyan Xu and
Yong Li and
Jian Yang},
title = {Localizing Anomalies from Weakly-Labeled Videos},
journal = {IEEE Transactions on Image Processing (TIP)},
year = {2021}
}
For more details about how the dataset was created and used, see the original WSAL GitHub repository.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Traffic-related data collected by the Boston Transportation Department, as well as other City departments and State agencies. Various types of counts: Turning Movement Counts, Automated Traffic Recordings, Pedestrian Counts, Delay Studies, and Gap Studies.
~_Turning Movement Counts (TMC)_ present the number of motor vehicles, pedestrians, and cyclists passing through the particular intersection. Specific movements and crossings are recorded for all street approaches involved with the intersection. This data is used in traffic signal retiming programs and for signal requests. Counts are typically conducted for 2-, 4-, 11-, and 12-Hr periods.
~_Automated Traffic Recordings (ATR)_ record the volume of motor vehicles traveling along a particular road, measures of travel speeds, and approximations of the class of the vehicles (motorcycle, 2-axle, large box truck, bus, etc). This type of count is conducted only along a street link/corridor, to gather data between two intersections or points of interest. This data is used in travel studies, as well as to review concerns about street use, speeding, and capacity. Counts are typically conducted for 12- & 24-Hr periods.
~_Pedestrian Counts (PED)_ record the volume of individual persons crossing a given street, whether at an existing intersection or a mid-block crossing. This data is used to review concerns about crossing safety, as well as for access analysis for points of interest. Counts are typically conducted for 2-, 4-, 11-, and 12-Hr periods.
~_Delay Studies (DEL)_ measure the delay experienced by motor vehicles due to the effects of congestion. Counts are typically conducted for a 1-Hr period at a given intersection or point of intersecting vehicular traffic.
~_Gap Studies (GAP)_ record the number of gaps which are typically present between groups of vehicles traveling through an intersection or past a point on a street. This data is used to assess opportunities for pedestrians to cross the street and for analyses on vehicular “platooning”. Counts are typically conducted for a specific 1-Hr period at a single point of crossing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PeMS traffic datasets have been collected by the California Transportation (Caltrans) agency for 30-second granularity, and the raw and aggregated data are publicly available on their website (https://pems.dot.ca.gov/?dnode=Clearinghouse&type=meta&district_id=7&submit=Submit). We have gathered 5-minute aggregated vehicular traffic state (i.e traffic speed) dataset for district four and seven of California for 2022.
We have used Bing Distance Matrix API to compute a driving distance between each sensor. The API can be used to compute a driving distance between a single source or multiple sources and source or multiple destinations at once.
In addition, the weather datasets have been collected from https://www.visualcrossing.com/weather/weather-data-services and the datasets have one-hour granularity, and we have only removed some of the unnecessary columns.
The Indian road, unlike other geographies, demands a constant need for observation and prediction, a demand that can challenge even the most skilled drivers.
Building a high performing AI solution that can handle this challenge requires access to large amount of annotated data and building this on your own is immensely time consuming. We are here to help!
Get access to feeds with
A Million 2D bounding box annotations -150K+ Images (and adding more) -City, Highway & Suburban roads -Day, night and twilight lighting conditions -1080p and 720p high resolution images -Classes include: Bicycle, Car, Motorcycle, Bus, Truck, Traffic light, Traffic signs, People, Dog, Cow, Barricade
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.