9 datasets found
  1. Network Traffic Dataset

    • kaggle.com
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravikumar Gattu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

    Content :

    This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

    The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

    Dataset Columns:

    No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

    Acknowledgements :

    I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

    Ravikumar Gattu , Susmitha Choppadandi

    Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

    **Dataset License: ** CC0: Public Domain

    Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    ML techniques benefits from this Dataset :

    This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

    1. Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

    2. Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

    3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

  2. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Homan, Sophia (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Chan-Tin, Eric
    Soni, Shreena
    Homan, Sophia
    Honig, Joshua
    Moran, Madeline
    Ferrell, Nathan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  3. U.S. Vessel Traffic App

    • oceans-esrioceans.hub.arcgis.com
    Updated Apr 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2021). U.S. Vessel Traffic App [Dataset]. https://oceans-esrioceans.hub.arcgis.com/datasets/esri::u-s-vessel-traffic-app
    Explore at:
    Dataset updated
    Apr 8, 2021
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    United States
    Description

    The U.S. Vessel Traffic application is a web-based visualization and data-access utility created by Esri. Explore U.S. maritime activity, look for patterns, and download manageable subsets of this massive data set. Vessel traffic data are an invaluable resource made available to our community by the US Coast Guard, NOAA and BOEM through Marine Cadastre. This information can help marine spatial planners better understand users of ocean space and identify potential space-use conflicts. To download this data for your own analysis, explore the Download Options, navigate to a NOAA Electronic Navigation Chart area of interest, and make your selection. This data was sourced from the Automatic Identification System (AIS) provided by USCG, NOAA, and BOEM through Marine Cadastre and aggregated for visualization and sharing in ArcGIS Pro. This application was built with the ArcGIS API for JavaScript. Access this data as an ArcGIS Online collection here. Learn more about AIS tracking here. Find more ocean and maritime resources in Living Atlas. Inquiries can be sent to Keith VanGraafeiland.

  4. S

    USTC-TFC2016

    • scidb.cn
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang Wei; Zhu Ming; Zeng Xuewen; Ye Xiaozhou; Sheng Yiqiang (2025). USTC-TFC2016 [Dataset]. http://doi.org/10.57760/sciencedb.18772
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Wang Wei; Zhu Ming; Zeng Xuewen; Ye Xiaozhou; Sheng Yiqiang
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The USTC-TFC2016 dataset is mainly used for network traffic classification research, including malicious traffic and normal application traffic, and is jointly completed by the University of Science and Technology of China and the Institute of Acoustics of the Chinese Academy of Sciences. The data comes from two sources: one is 10 types of malicious traffic selected from the CTU dataset, which were collected by researchers from the Czech CTU University from real environments between 2011 and 2015; The second type is the 10 normal application traffic generated by network instrument simulation. This dataset consists of 20 types of traffic, corresponding to 20 data files, all in pcap format. In order to save space, some pcap files are compressed and uploaded. After decompression, the total size of each pcap file is 3.71GB. For more information about this dataset, please refer to: 1) Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye and Yiqiang Sheng, “Malware traffic classification using convolutional neural network for representation learning”ICOIN 2017,pp712-717; 2) Wang Wei, Research on Network Traffic Classification and Anomaly Detection Methods Based on Deep Learning, Ph.D. Thesis, University of Science and Technology of China, 2018. This dataset and preprocessing tool were released in 2018 https://github.com/echowei/ Many domestic and foreign researchers are using this dataset. Due to bandwidth and capacity constraints, it is often unable to download. Upload it to the "Science Database" website of the Chinese Academy of Sciences for long-term storage and easy download. At the same time, we look forward to relevant researchers uploading and sharing new malicious traffic and encrypted traffic using domestic cryptographic protocols, as well as expanding this dataset to include more types of malicious and normal traffic (such as 100 each), forming a richer and more comprehensive dataset to approach the actual network traffic situation.

  5. VPN and Non-VPN Application Traffic (CIC-VPN2016)

    • kaggle.com
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krish Agarwal (2025). VPN and Non-VPN Application Traffic (CIC-VPN2016) [Dataset]. https://www.kaggle.com/datasets/noobbcoder2/vpn-and-non-vpn-application-traffic-cic-vpn2016
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Krish Agarwal
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context This dataset is a consolidated and cleaned CSV version of the ISCX VPN-nonVPN 2016 dataset from the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick. The original dataset was created to characterize and identify different types of network traffic, which is crucial for network management, Quality of Service (QoS) optimization, and cybersecurity.

    This single CSV file combines the multiple .arff files from the original dataset, making it easier to use for data analysis and machine learning projects in Python.

    Content The dataset contains network flow features extracted from packet captures (PCAPs). Each row represents a single network flow and has been labeled with the specific application type and whether it was routed through a VPN.

    Features (X): Include over 20 time-related flow features like duration, flowBytesPerSecond, flowPktsPerSecond, min_active, max_idle, etc. These features describe the timing, duration, and volume of the data flows.

    Target (y): The target column, traffic_type, is a multi-class label describing the application and connection type (e.g., VPN-CHAT, NonVPN-STREAMING, VPN-Browse).

    Potential Uses & Inspiration 🚀 Multi-Class Classification: Can you build a model to accurately identify the specific application generating the traffic?

    Binary Classification: Can you distinguish between VPN and Non-VPN traffic, regardless of the application?

    Resource Allocation: Predict which types of traffic (e.g., Streaming) require more bandwidth, helping to build smarter network management tools.

    Federated Learning: This dataset is ideal for simulating a Federated Learning environment where data from different "users" (applications) is used to train a central model without sharing raw data.

  6. U.S. Vessel Traffic

    • fiu-srh-open-data-hub-fiugis.hub.arcgis.com
    Updated Apr 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2021). U.S. Vessel Traffic [Dataset]. https://fiu-srh-open-data-hub-fiugis.hub.arcgis.com/maps/7765c67c91344f018988910212e855b0
    Explore at:
    Dataset updated
    Apr 7, 2021
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    These layers are used in the The U.S. Vessel Traffic application; a web-based visualization and data-access utility created by Esri. Explore U.S. maritime activity, look for patterns of vessel activity such as around ports and fishing grounds, or download manageable subsets of this massive data set. Vessel traffic data are an invaluable resource made available to our community by the US Coast Guard, NOAA and BOEM through Marine Cadastre. This information can help marine spatial planners better understand users of ocean space and identify potential space-use conflicts.To download this data for your own analysis, explore the Download Options, navigate to a NOAA Electronic Navigation Chart area of interest, and make your selection. This data was sourced from the Automatic Identification System (AIS) provided by USCG, NOAA, and BOEM through Marine Cadastre and aggregated for visualization and sharing in ArcGIS Pro. This application was built with the ArcGIS API for JavaScript.Access this data as an ArcGIS Online collection here. Learn more about AIS tracking here. Find more ocean and maritime resources in Living Atlas. Inquiries can be sent to Keith VanGraafeiland.

  7. Traffic Crash Data

    • data.milwaukee.gov
    csv
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milwaukee Police Department (2025). Traffic Crash Data [Dataset]. https://data.milwaukee.gov/dataset/traffic_crash
    Explore at:
    csv(122571597)Available download formats
    Dataset updated
    Aug 30, 2025
    Dataset authored and provided by
    Milwaukee Police Departmenthttp://city.milwaukee.gov/police
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Update Frequency: Daily

    This data-set includes traffic crash information including case number, accident date and the location.

    • Reportable crash reports can take up to 10 business days to appear after the date of the crash if there are no issues with the report.

    • If you cannot find your crash report after 10 business days, please call the Milwaukee Police Department Open Records Section at (414) 935-7435 for further assistance.

    • Non-reportable crash reports can only be obtained by contacting the Open Records Section and will not show up in a search on this site. A non-reportable crash is any accident that does not:

    1) result in injury or death to any person

    2) damage government-owned non-vehicle property to an apparent extent of $200 or more

    3) result in total damage to property owned by any one person to an apparent extent of $1000 or more.

    • All MV4000 crash reports, completed by MPD officers, will be available from the Wisconsin Department of Transportation (WisDOT) Division of Motor Vehicles (DMV) Accident Records Unit, generally 10 days after the incident.

    Online Request: Request your Crash Report online at WisDOT-DMV website, https://app.wi.gov/crashreports.

    Mail: Wisconsin Department of Transportation Crash Records Unit P.O. Box 7919 Madison, WI 53707-7919

    Phone: (608) 266-8753

    To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.

  8. Facebook users worldwide 2017-2027

    • statista.com
    • es.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  9. TikTok global quarterly downloads 2018-2024

    • statista.com
    • es.statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). TikTok global quarterly downloads 2018-2024 [Dataset]. https://www.statista.com/topics/1002/mobile-app-usage/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    In the fourth quarter of 2024, TikTok generated around 186 million downloads from users worldwide. Initially launched in China first by ByteDance as Douyin, the short-video format was popularized by TikTok and took over the global social media environment in 2020. In the first quarter of 2020, TikTok downloads peaked at over 313.5 million worldwide, up by 62.3 percent compared to the first quarter of 2019. TikTok interactions: is there a magic formula for content success? In 2024, TikTok registered an engagement rate of approximately 4.64 percent on video content hosted on its platform. During the same examined year, the social video app recorded over 1,100 interactions on average. These interactions were primarily composed of likes, while only recording less than 20 comments per piece of content on average in 2024. The platform has been actively monitoring the issue of fake interactions, as it removed around 236 million fake likes during the first quarter of 2024. Though there is no secret formula to get the maximum of these metrics, recommended video length can possibly contribute to the success of content on TikTok. It was recommended that tiny TikTok accounts with up to 500 followers post videos that are around 2.6 minutes long as of the first quarter of 2024. While, the ideal video duration for huge TikTok accounts with over 50,000 followers was 7.28 minutes. The average length of TikTok videos posted by the creators in 2024 was around 43 seconds. What’s trending on TikTok Shop? Since its launch in September 2023, TikTok Shop has become one of the most popular online shopping platforms, offering consumers a wide variety of products. In 2023, TikTok shops featuring beauty and personal care items sold over 370 million products worldwide. TikTok shops featuring womenswear and underwear, as well as food and beverages, followed with 285 and 138 million products sold, respectively. Similarly, in the United States market, health and beauty products were the most-selling items, accounting for 85 percent of sales made via the TikTok Shop feature during the first month of its launch. In 2023, Indonesia was the market with the largest number of TikTok Shops, hosting over 20 percent of all TikTok Shops. Thailand and Vietnam followed with 18.29 and 17.54 percent of the total shops listed on the famous short video platform, respectively. 

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Organization logo

Network Traffic Dataset

Use this Dataset for analysis the network traffic and designing the applications

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

  1. Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

  2. Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

Search
Clear search
Close search
Google apps
Main menu