Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for INTERNATIONAL INTERNET BANDWIDTH BITS PER reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Facebook
Twitter| Abstract | |||||
|---|---|---|---|---|---|
| Data Set Characteristics: | Multivariate | Number of Instances: | 65532 | Area: | Computer |
| Attribute Characteristics: | N/A | Number of Attributes: | 12 | Date Donated | 2019-02-04 |
| Associated Tasks: | Classification | Missing Values? | N/A | Number of Web Hits: | 701 |
Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.
There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.
Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received
F. Ertam and M. Kaya, “Classification of firewall log files with multiclass support vector machine,†in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains traffic data collected from an Internet of Things (IoT) network using ESP32 microcontrollers and a Raspberry Pi acting as a gateway. The goal is to monitor and forecast various network performance parameters in an IoT environment using time series models, particularly ARIMA. Each ESP32 device collects environmental and network performance data over time and sends it to a centralized Raspberry Pi gateway. The data was gathered over a 24-hour period and exported into CSV format for further analysis and modeling.
timestamp: The date and time of the data collection.temperature: Temperature readings in degrees Celsius.humidity(%): Humidity percentage from DHT sensor.latency(ms): Network latency in milliseconds.rssi(dBm): Received Signal Strength Indicator in dBm.packet_loss(%): Estimated packet loss in percentage.throughput(bytes/sec): Throughput in bytes per second.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.
There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.
Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received
F. Ertam and M. Kaya, “Classification of firewall log files with multiclass support vector machine,†in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.
Facebook
Twitterhttps://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The Physical Internet ecosystem encompasses a range of interconnected components working in synergy: Logistic Nodes: These are strategically located physical facilities, acting as hubs for storage, handling, consolidation, and distribution of goods. They are designed for efficient material flow and optimized throughput. Logistic Network: This encompasses the comprehensive infrastructure connecting these nodes, including diverse transportation modes (road, rail, sea, air), communication networks, and advanced information systems ensuring seamless data flow and real-time visibility. Solutions: Software and hardware technologies, including Warehouse Management Systems (WMS), Transportation Management Systems (TMS), and advanced analytics platforms, enable the integration and optimization of logistical processes, driving efficiency and reducing operational costs. Services: A wide array of value-added services are offered, such as inventory management, cross-docking, last-mile delivery solutions, customs brokerage, and reverse logistics, enhancing overall supply chain agility and responsiveness. Recent developments include: Amazon.com Inc., for example, The physical internet is about to get a lot more involved with an effort to build a network where boxes are bytes travelling through the supply chain network in the same way that data travels on the internet. Amazon wants to vertically integrate its logistics., In order to provide a holistic approach for logistics and supply chain management invention research, innovation, and market deployment in Europe, the European Technology Platform (ETP) Alliance for Logistics Innovation via Collaboration in Europe (ALICE) was founded. . Key drivers for this market are: .4. Developing Interconnectivity, . Internet of Things (IoT) Integral Towards Revolutionizing Logistics Paradigm. Potential restraints include: . Need for Mental Shift Towards Physical Internet, . Restraint Impact Analysis.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was developed from real data on the usage of the corporate data network at the Universidade Federal do Rio Grande do Norte (UFRN). The main objective is to enable detailed observation of the university's network infrastructure and make this data available to the academic community. Data collection started on August 30, 2023, with the last query conducted on February 7, 2025, covering a total of approximately 19 months of continuous observations. During this period, about 1.5 months of data were lost due to failures in the data collection process or maintenance of the system responsible for capturing the data.
The data collections cover administrative, academic, and classroom sectors, spanning a total of 13 buildings within the university, providing a broad view of the network across different environments.
The dataset contains a total of 1,675,843 entries, each with 49 attributes.
The dataset contains approximately 1,675,843 entries, with 49 attributes per entry. It is available in CSV format.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
| Feature | Meaning |
| btle.advertising_header | BLE Advertising Packet Header |
| btle.advertising_header.ch_sel | BLE Advertising Channel Selection Algorithm |
| btle.advertising_header.length | BLE Advertising Length |
| btle.advertising_header.pdu_type | BLE Advertising PDU Type |
| btle.advertising_header.randomized_rx | BLE Advertising Rx Address |
| btle.advertising_header.randomized_tx | BLE Advertising Tx Address |
| btle.advertising_header.rfu.1 | Reserved For Future 1 |
| btle.advertising_header.rfu.2 | Reserved For Future 2 |
| btle.advertising_header.rfu.3 | Reserved For Future 3 |
| btle.advertising_header.rfu.4 | Reserved For Future 4 |
| btle.control.instant | Instant Value Within a BLE Control Packet |
| btle.crc.incorrect | Incorrect CRC |
| btle.extended_advertising | Advertiser Data Information |
| btle.extended_advertising.did | Advertiser Data Identifier |
| btle.extended_advertising.sid | Advertiser Set Identifier |
| btle.length | BLE Length |
| frame.cap_len | Frame Length Stored Into the Capture File |
| frame.interface_id | Interface ID |
| frame.len | Frame Length Wire |
| nordic_ble.board_id | Board ID |
| nordic_ble.channel | Channel Index |
| nordic_ble.crcok | Indicates if CRC is Correct |
| nordic_ble.flags | Flags |
| nordic_ble.packet_counter | Packet Counter |
| nordic_ble.packet_time | Packet time (start to end) |
| nordic_ble.phy | PHY |
| nordic_ble.protover | Protocol Version |
Identified Key Features Within IP-Based Packets Dataset
| Feature | Meaning |
| http.content_length | Length of content in an HTTP response |
| http.request | HTTP request being made |
| http.response.code | Sequential number of an HTTP response |
| http.response_number | Sequential number of an HTTP response |
| http.time | Time taken for an HTTP transaction |
| tcp.analysis.initial_rtt | Initial round-trip time for TCP connection |
| tcp.connection.fin | TCP connection termination with a FIN flag |
| tcp.connection.syn | TCP connection initiation with SYN flag |
| tcp.connection.synack | TCP connection establishment with SYN-ACK flags |
| tcp.flags.cwr | Congestion Window Reduced flag in TCP |
| tcp.flags.ecn | Explicit Congestion Notification flag in TCP |
| tcp.flags.fin | FIN flag in TCP |
| tcp.flags.ns | Nonce Sum flag in TCP |
| tcp.flags.res | Reserved flags in TCP |
| tcp.flags.syn | SYN flag in TCP |
| tcp.flags.urg | Urgent flag in TCP |
| tcp.urgent_pointer | Pointer to urgent data in TCP |
| ip.frag_offset | Fragment offset in IP packets |
| eth.dst.ig | Ethernet destination is in the internal network group |
| eth.src.ig | Ethernet source is in the internal network group |
| eth.src.lg | Ethernet source is in the local network group |
| eth.src_not_group | Ethernet source is not in any network group |
| arp.isannouncement | Indicates if an ARP message is an announcement |
Identified Key Features Within IP-Based Flows Dataset
| Feature | Meaning |
| proto | Transport layer protocol of the connection |
| service | Identification of an application protocol |
| orig_bytes | Originator payload bytes |
| resp_bytes | Responder payload bytes |
| history | Connection state history |
| orig_pkts | Originator sent packets |
| resp_pkts | Responder sent packets |
| flow_duration | Length of the flow in seconds |
| fwd_pkts_tot | Forward packets total |
| bwd_pkts_tot | Backward packets total |
| fwd_data_pkts_tot | Forward data packets total |
| bwd_data_pkts_tot | Backward data packets total |
| fwd_pkts_per_sec | Forward packets per second |
| bwd_pkts_per_sec | Backward packets per second |
| flow_pkts_per_sec | Flow packets per second |
| fwd_header_size | Forward header bytes |
| bwd_header_size | Backward header bytes |
| fwd_pkts_payload | Forward payload bytes |
| bwd_pkts_payload | Backward payload bytes |
| flow_pkts_payload | Flow payload bytes |
| fwd_iat | Forward inter-arrival time |
| bwd_iat | Backward inter-arrival time |
| flow_iat | Flow inter-arrival time |
| active | Flow active duration |
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user.
With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Interactive full Shiny app can be seen here( https://multiscal.shinyapps.io/appStore/)
Data collection date (from API); July 2017
Dimension of the data set; 7197 rows and 16 columns
"id" : App ID
"track_name": App Name
"size_bytes": Size (in Bytes)
"currency": Currency Type
"price": Price amount
"rating_count_tot": User Rating counts (for all version)
"rating_count_ver": User Rating counts (for current version)
"user_rating" : Average User Rating value (for all version)
"user_rating_ver": Average User Rating value (for current version)
"ver" : Latest version code
"cont_rating": Content Rating
"prime_genre": Primary Genre
"sup_devices.num": Number of supporting devices
"ipadSc_urls.num": Number of screenshots showed for display
"lang.num": Number of supported languages
"vpp_lic": Vpp Device Based Licensing Enabled
The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Reference: R package
From github, with
devtools::install_github("ramamet/applestoreR")
Copyright (c) 2018 Ramanathan Perumal
Facebook
TwitterIndicates for each day and for each zone the amount of data sent to the Internet; value is expressed in bytes (8bits)
Facebook
TwitterEurecom ElasticMon 5G Dataset
This dataset, sourced from the Eurecom ElasticMon 5G monitoring framework, includes a range of metrics that are pivotal for analyzing the performance of 4G and 5G Radio Access Networks (RAN). It covers various aspects of network performance, including signal strength, data transmission volumes, and quality indicators. The data is crucial for developing machine learning models for predictive analysis and optimization of network performance.
Columns Description:
date_index: Timestamp or index indicating the date and time of the data record.
rsrp (Reference Signal Received Power): Measures the power level of the signal received by the UE (User Equipment).
rsrq (Reference Signal Received Quality): Indicates the quality of the received reference signal.
wbcqi (Wideband Channel Quality Indicator): Provides information about the quality of the downlink channel.
macStats_phr (MAC layer Power HeadRoom): Indicates the available power capacity of the UE.
dlCqiReport_sfnSn (Downlink CQI Report with SFN and SN): Downlink Channel Quality Indicator with System Frame Number and Subframe Number.
macStats_totalBytesSdusDl: Total number of bytes for Service Data Units on the Downlink at the MAC layer.
macStats_totalTbsUl: Total Transport Block Size for Uplink.
macStats_mcs1Ul: Modulation and Coding Scheme for the first transport block in Uplink.
macStats_totalPduDl: Total number of Protocol Data Units in Downlink.
macStats_totalBytesSdusUl: Total number of bytes for Service Data Units on the Uplink at the MAC layer.
macStats_tbsDl: Transport Block Size for Downlink.
macStats_totalPrbUl: Total Physical Resource Blocks used in Uplink.
macStats_macSdusDl_sduLength: Length of the Service Data Unit in the Downlink.
macStats_macSdusDl_lcid: Logical Channel ID for Downlink.
macStats_prbUl: Physical Resource Blocks used in Uplink.
macStats_totalPduUl: Total number of Protocol Data Units in Uplink.
macStats_mcs1Dl: Modulation and Coding Scheme for the first transport block in Downlink.
macStats_mcs2Dl: Modulation and Coding Scheme for the second transport block in Downlink.
macStats_prbDl: Physical Resource Blocks used in Downlink.
macStats_totalPrbDl: Total Physical Resource Blocks used in Downlink.
macStats_prbRetxDl: Physical Resource Blocks used for retransmissions in Downlink.
macStats_totalTbsDl: Total Transport Block Size for Downlink.
ulCqiReport_sfnSn (Uplink CQI Report with SFN and SN): Uplink Channel Quality Indicator with System Frame Number and Subframe Number.
pdcpStats_pktRx: Number of PDCP packets received.
pdcpStats_pktRxW: PDCP packets received with waiting.
pdcpStats_pktRxAiatW: Average Inter Arrival Time for PDCP packets received with waiting.
pdcpStats_pktRxOo: PDCP packets received out of order.
pdcpStats_pktRxBytesW: Bytes of PDCP packets received with waiting.
pdcpStats_pktRxSn: Sequence number of the last PDCP packet received.
pdcpStats_pktTxBytesW: Bytes of PDCP packets transmitted with waiting.
pdcpStats_pktTxSn: Sequence number of the last PDCP packet transmitted.
pdcpStats_pktTxBytes: Bytes of PDCP packets transmitted.
pdcpStats_pktRxAiat: Average Inter Arrival Time for PDCP packets received.
pdcpStats_pktRxBytes: Bytes of PDCP packets received.
pdcpStats_pktTx: Number of PDCP packets transmitted.
pdcpStats_pktTxW: PDCP packets transmitted with waiting.
pdcpStats_pktTxAiatW: Average Inter Arrival Time for PDCP packets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:
W-2022-44
Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45
Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46
Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47
Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22
Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M
Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:
ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons
Link to other CESNET datasets
https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:
@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This dataset contains web traffic records collected through AWS CloudWatch, aimed at detecting suspicious activities and potential attack attempts.
The data were generated by monitoring traffic to a production web server, using various detection rules to identify anomalous patterns.
In today's cloud environments, cybersecurity is more crucial than ever. The ability to detect and respond to threats in real time can protect organizations from significant consequences. This dataset provides a view of web traffic that has been labeled as suspicious, offering a valuable resource for developers, data scientists, and security experts to enhance threat detection techniques.
Each entry in the dataset represents a stream of traffic to a web server, including the following columns:
bytes_in: Bytes received by the server.
bytes_out: Bytes sent from the server.
creation_time: Timestamp of when the record was created.
end_time: Timestamp of when the connection ended.
src_ip: Source IP address.
src_ip_country_code: Country code of the source IP.
protocol: Protocol used in the connection.
response.code: HTTP response code.
dst_port: Destination port on the server.
dst_ip: Destination IP address.
rule_names: Name of the rule that identified the traffic as suspicious.
observation_name: Observations associated with the traffic.
source.meta: Metadata related to the source.
source.name: Name of the traffic source.
time: Timestamp of the detected event.
detection_types: Type of detection applied.
This dataset is ideal for:
Facebook
TwitterOriginal dataset page, license, context, and description at link below: https://www.unb.ca/cic/datasets/darknet2020.html
This is a dataset gathered to test novel methods for classifying darknet traffic. Dataset gathered by the Canadian Institute for Cybersecurity at the University of New Brunswick.
Each unique sample has a flow id. Additional columns include:
Src IP: Source IP Address Src Port: Source Port Dst IP: Destination IP Address Dst Port: Destination Port Protocol: Internet Protocol Version Timestamp: Timestamp for when traffic was sent Flow Duration: Duration Total Fwd Packet: Total number of packets from source to destination Total Bwd packets: Total Length of Fwd Packet Total Length of Bwd Packet Fwd Packet Length Max Fwd Packet Length Min Fwd Packet Length Mean Fwd Packet Length Std Bwd Packet Length Max Bwd Packet Length Min Bwd Packet Length Mean Bwd Packet Length Std Flow Bytes/s Flow Packets/s Flow IAT Mean Flow IAT Std Flow IAT Max Flow IAT Min Fwd IAT Total Fwd IAT Mean Fwd IAT Std Fwd IAT Max Fwd IAT Min Bwd IAT Total Bwd IAT Mean Bwd IAT Std Bwd IAT Max Bwd IAT Min Fwd PSH Flags Bwd PSH Flags Fwd URG Flags Bwd URG Flags Fwd Header Length Bwd Header Length Fwd Packets/s Bwd Packets/s Packet Length Min Packet Length Max Packet Length Mean Packet Length Std Packet Length Variance FIN Flag Count SYN Flag Count RST Flag Count PSH Flag Count ACK Flag Count URG Flag Count CWE Flag Count ECE Flag Count Down/Up Ratio Average Packet Size Fwd Segment Size Avg Bwd Segment Size Avg Fwd Bytes/Bulk Avg Fwd Packet/Bulk Avg Fwd Bulk Rate Avg Bwd Bytes/Bulk Avg Bwd Packet/Bulk Avg Bwd Bulk Rate Avg Subflow Fwd Packets Subflow Fwd Bytes Subflow Bwd Packets Subflow Bwd Bytes FWD Init Win Bytes Bwd Init Win Bytes Fwd Act Data Pkts Fwd Seg Size Min Active Mean Active Std Active Max Active Min Idle Mean Idle Std Idle Max Idle Min Label Label.1
Original Paper: Arash Habibi Lashkari, Gurdip Kaur, and Abir Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning”, 10th International Conference on Communication and Network Security, Tokyo, Japan, November 2020
Wanting to better understand how Darknet routing works, and how to examine the traffic that goes through it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Indicates for each day and for each area the amount of data sent to the Internet; the value is expressed in bytes (8bit)
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This resource includes input data used in the work "Machine-Learning Based Prediction of Multiple Types of Network Traffic" by Aleksandra Knapińska, Piotr Lechowicz, and Krzysztof Walkowiak; published in International Conference on Computational Science (ICCS) 2021, Lecture Notes in Computer Science, vol 12742. pp. 122-136. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_12 The work was supported by the National Science Centre, Poland, under Grant 2019/35/B/ST7/04272. Both seattle_november.xml and seattle_december.xml files include internet traffic data from Seattle Internet Exchange Point. The european.xml file includes internet traffic data from one of the European Internet Exchange Points. Each file includes the traffic volume decomposed into specific frame size ranges. Each file starts with a metadata section providing general information. The period covered by a specific file is indicated by its 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, and the European IXP is located in the CET timezone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes in all three files. Each file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This resource includes input data used in the work "Long-term prediction of multiple types of time-varying network traffic using chunk-based ensemble learning" by Aleksandra Knapińska, Piotr Lechowicz, Weronika Węgier, and Krzysztof Walkowiak.
The work was supported by the National Science Centre, Poland, under Grants 2019/35/B/ST7/04272, 2018/31/D/ST6/0304, and 2019/35/B/ST6/04442.
The SIX2021.xml file includes internet traffic data from the Seattle Internet Exchange Point collected for one year.
The file contains information about the traffic volume decomposed into specific frame size ranges. It starts with a metadata section providing general information. The covered period is indicated by the 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes.
The file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.
Facebook
TwitterEvery year people start to summarize the year passed. Movie lovers look for top 10 best movies from top websites and their favorite youtubers. This end of the year (2017) is no different. Here is a compilation of Lists containing Top 10 movies of 2017 from the internet.
The data is collected from various sources on the internet. There are two files - Top 10 Movies 2017.csv - This the list of Movie Names Ranked from 10 to 1 and the Source from where they were collected. Some Lists are not Ranked which is mentioned in the column Ordered. - IMBD Links.csv - This file contains movie names and their associated IMDB link.
The data is collected from different websites and youtube by using the search keyword "top 10 movies of 2017". All the sources are mentioned in the data file "Top 10 movies 2017.csv" in the "url" column. Movies are only included if the source has only 10 movies in their list, lists containing more than 10 movies are ignored.
Metacritic has collected different lists and ranked based on those list. This data is not collected from metacritic, however there may be some overlapings http://www.metacritic.com/feature/film-critics-list-the-top-10-movies-of-2017
Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.