27 datasets found
  1. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  2. T

    INTERNATIONAL INTERNET BANDWIDTH BITS PER by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2021). INTERNATIONAL INTERNET BANDWIDTH BITS PER by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/international-internet-bandwidth-bits-per-
    Explore at:
    csv, xml, excel, jsonAvailable download formats
    Dataset updated
    Dec 20, 2021
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for INTERNATIONAL INTERNET BANDWIDTH BITS PER reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  3. Data from: Internet Firewall

    • kaggle.com
    zip
    Updated Aug 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary (2020). Internet Firewall [Dataset]. https://www.kaggle.com/sgd825344491/internet-firewall
    Explore at:
    zip(772612 bytes)Available download formats
    Dataset updated
    Aug 29, 2020
    Authors
    Gary
    Description
    Abstract
    Data Set Characteristics:MultivariateNumber of Instances:65532Area:Computer
    Attribute Characteristics:N/ANumber of Attributes:12Date Donated2019-02-04
    Associated Tasks:ClassificationMissing Values?N/ANumber of Web Hits:701

    Source:

    Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.

    Data Set Information:

    There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.

    Attribute Information:

    Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received

    Relevant Papers:

    F. Ertam and M. Kaya, “Classification of firewall log files with multiclass support vector machine,†in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.

  4. Internet of Things Network Traffic

    • kaggle.com
    zip
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fadel Achmad Daniswara (2025). Internet of Things Network Traffic [Dataset]. https://www.kaggle.com/datasets/fadelachmaddaniswara/internet-of-things-network-traffic
    Explore at:
    zip(675457 bytes)Available download formats
    Dataset updated
    May 22, 2025
    Authors
    Fadel Achmad Daniswara
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Internet of Things Network Traffic

    Description:

    This dataset contains traffic data collected from an Internet of Things (IoT) network using ESP32 microcontrollers and a Raspberry Pi acting as a gateway. The goal is to monitor and forecast various network performance parameters in an IoT environment using time series models, particularly ARIMA. Each ESP32 device collects environmental and network performance data over time and sends it to a centralized Raspberry Pi gateway. The data was gathered over a 24-hour period and exported into CSV format for further analysis and modeling.

    Columns:

    • timestamp: The date and time of the data collection.
    • temperature: Temperature readings in degrees Celsius.
    • humidity(%): Humidity percentage from DHT sensor.
    • latency(ms): Network latency in milliseconds.
    • rssi(dBm): Received Signal Strength Indicator in dBm.
    • packet_loss(%): Estimated packet loss in percentage.
    • throughput(bytes/sec): Throughput in bytes per second.

    Use Cases:

    • Time series forecasting (ARIMA, SARIMA, LSTM)
    • IoT network performance analysis
    • Anomaly detection in traffic
    • Edge computing and predictive maintenance experiments

    Devices:

    • ESP32 A & ESP32 B (clients)
    • Raspberry Pi (gateway)
  5. Data from: Revealing QoE of Web Users from Encrypted Network Traffic

    • figshare.com
    zip
    Updated Jun 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi (2020). Revealing QoE of Web Users from Encrypted Network Traffic [Dataset]. http://doi.org/10.6084/m9.figshare.12459293.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}

  6. Data from: Internet Firewall Data Set

    • kaggle.com
    zip
    Updated Apr 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bojan Tunguz (2021). Internet Firewall Data Set [Dataset]. https://www.kaggle.com/tunguz/internet-firewall-data-set
    Explore at:
    zip(772604 bytes)Available download formats
    Dataset updated
    Apr 14, 2021
    Authors
    Bojan Tunguz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Source:

    Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.

    Data Set Information:

    There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.

    Attribute Information:

    Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received

    Relevant Papers:

    F. Ertam and M. Kaya, “Classification of firewall log files with multiclass support vector machine,†in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.

  7. P

    Physical Internet Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Physical Internet Market Report [Dataset]. https://www.promarketreports.com/reports/physical-internet-market-8586
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Physical Internet ecosystem encompasses a range of interconnected components working in synergy: Logistic Nodes: These are strategically located physical facilities, acting as hubs for storage, handling, consolidation, and distribution of goods. They are designed for efficient material flow and optimized throughput. Logistic Network: This encompasses the comprehensive infrastructure connecting these nodes, including diverse transportation modes (road, rail, sea, air), communication networks, and advanced information systems ensuring seamless data flow and real-time visibility. Solutions: Software and hardware technologies, including Warehouse Management Systems (WMS), Transportation Management Systems (TMS), and advanced analytics platforms, enable the integration and optimization of logistical processes, driving efficiency and reducing operational costs. Services: A wide array of value-added services are offered, such as inventory management, cross-docking, last-mile delivery solutions, customs brokerage, and reverse logistics, enhancing overall supply chain agility and responsiveness. Recent developments include: Amazon.com Inc., for example, The physical internet is about to get a lot more involved with an effort to build a network where boxes are bytes travelling through the supply chain network in the same way that data travels on the internet. Amazon wants to vertically integrate its logistics., In order to provide a holistic approach for logistics and supply chain management invention research, innovation, and market deployment in Europe, the European Technology Platform (ETP) Alliance for Logistics Innovation via Collaboration in Europe (ALICE) was founded. . Key drivers for this market are: .4. Developing Interconnectivity, . Internet of Things (IoT) Integral Towards Revolutionizing Logistics Paradigm. Potential restraints include: . Need for Mental Shift Towards Physical Internet, . Restraint Impact Analysis.

  8. Corporate network dataset

    • kaggle.com
    zip
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Fhelipe Ribeiro (2025). Corporate network dataset [Dataset]. https://www.kaggle.com/datasets/luisfheliperibeiro/corporate-network-dataset
    Explore at:
    zip(116752218 bytes)Available download formats
    Dataset updated
    Apr 25, 2025
    Authors
    Luis Fhelipe Ribeiro
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    General Description

    This dataset was developed from real data on the usage of the corporate data network at the Universidade Federal do Rio Grande do Norte (UFRN). The main objective is to enable detailed observation of the university's network infrastructure and make this data available to the academic community. Data collection started on August 30, 2023, with the last query conducted on February 7, 2025, covering a total of approximately 19 months of continuous observations. During this period, about 1.5 months of data were lost due to failures in the data collection process or maintenance of the system responsible for capturing the data.

    The data collections cover administrative, academic, and classroom sectors, spanning a total of 13 buildings within the university, providing a broad view of the network across different environments.

    The dataset contains a total of 1,675,843 entries, each with 49 attributes.

    Dataset Attributes, by Category

    1. Connected Machines and ARP (8 attributes)

    • Number of Access, Wi-Fi, Security, and VoIP Machines: Indicates the number of machines connected to each type of network, providing insight into the network size and device load.
    • ARP Value for Access, Wi-Fi, Security, and VoIP: Refers to the number of entries in the Address Resolution Protocol (ARP) table associated with each type of network. ARP is used to map IP addresses to MAC addresses and can indicate potential connectivity issues.

    2. Traffic Metrics (18 attributes)

    • Packet and Byte: Indicates whether the information queried is accounted in packets or transmitted bytes, with positive (1) or negative (-1) values.
    • Downlink and Uplink Bandwidth by Packets (Access, Wi-Fi, Security, VoIP): Refers to the number of packets received or sent by devices connected to each network type.
    • Downlink and Uplink Bandwidth by Bytes (Access, Wi-Fi, Security, VoIP): Refers to the number of bytes received or sent by devices connected to each network type.

    3. Collection Context (5 attributes)

    • Sector: The sector from which the data was collected (academic, administrative, or classroom).
    • Date: The date of the data collection.
    • Time of Day: The time period of the collection (morning, afternoon, or evening).
    • Day of the Week: The day of the week when the collection occurred.
    • Hour: The hour of the collection.

    4. Asset Identification (4 attributes)

    • Asset IP: The IP address of the monitored device.
    • Asset Model: The model of the network device.
    • Asset Part Number: The part number of the device.
    • Asset Firmware: The firmware version in use on the device.

    5. Asset Performance (6 attributes)

    • CPU Usage (% - 1 min and 5 min): The percentage of CPU usage on the device in the last minute and the last five minutes.
    • Memory Used (%): The percentage of memory used by the device.
    • Total and Used Memory (Kb): The total amount and the used amount of memory on the device, measured in Kb.
    • Temperature (°C): The temperature of the device in degrees Celsius.

    6. Port Packet Metrics (8 attributes)

    • Packet In and Out Counter: The number of packets of data that have entered and exited all the device's ports.
    • Broadcast Packet In and Out Counter: The number of broadcast packets that have entered and exited all the device's ports.
    • Multicast Packet In and Out Counter: The number of multicast packets that have entered and exited all the device's ports.
    • Packet Error In and Out Counter: The number of error packets that have entered and exited all the device's ports.

    Size and Format

    The dataset contains approximately 1,675,843 entries, with 49 attributes per entry. It is available in CSV format.

  9. Global SME Big Data Market 2014-2018

    • technavio.com
    pdf
    Updated May 30, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2014). Global SME Big Data Market 2014-2018 [Dataset]. https://www.technavio.com/report/global-sme-big-data-market-2014-2018
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2014
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Description

    Snapshot img { margin: 10px !important; } About SME Big Data Big data solutions include a wide range of hardware, software, and services required for processing and analyzing structured and unstructured data that is too big for traditional data processing tools to manage. These data are generated by various sources such as mobile devices, digital repositories, and enterprise applications and range in size from terabytes (10^12 bytes) to petabytes (10^15 bytes) and even exabytes (10^18 bytes). Due to the considerably large size of big data, it is difficult for SMEs to manage and analyze the data using existing traditional data processing tools. Big data solutions are being used for a wide range of applications such as conversation analysis in social networking websites, fraud management in the BFSI sector, and disease diagnosis in the Healthcare sector. Due to the increasing need for big data solutions, the Global SME Big Data market is expected to witness rapid growth during the forecast period. TechNavio's analysts forecast the Global SME Big Data market will grow at a CAGR of 42.94 percent over the period 2013-2018.Covered in this Report This report covers the present scenario and the growth prospects of the Global SME Big Data market for the period 2014-2018. To calculate the market size, the report considers revenue generated from sales ofHardware: Big data storage, servers, and networking componentsSoftware applications: Apache Hadoop, NoSQL, Cassandra, and other big data software applicationsServices: Big data analytics and consulting, implementation, support, and professional servicesTechNavio's report, the Global SME Big Data Market 2014-2018, has been prepared based on an in-depth market analysis with inputs from industry experts. The report covers the APAC, the EMEA, and the Americas regions; it also covers the Global SME Big Data market landscape and its growth prospects in the coming years. The report also includes a discussion of the key vendors operating in this market.Key RegionsAmericas APACEMEAKey VendorsHewlett-Packard Co.IBM Corp.Oracle Corp.Teradata Corp.Other Prominent VendorsAmazon Web Services, Inc.Cloudera, Inc.Couchbase Inc.EMC Corp.Google Inc.Microsoft Corp.SAP AGSplunk Inc.Key Market DriverIncreasing Need to Improve Business Processes Efficiency.For a full, detailed list, view our report.Key Market ChallengeLack of Awareness among SMEs about Potential of Big Data Solutions.For a full, detailed list, view our report.Key Market TrendIncreasing Market Consolidation.For a full, detailed list, view our report.Key Questions Answered in this ReportWhat will the market size be in 2018 and what will the growth rate be?What are the key market trends?What is driving this market?What are the challenges to market growth?Who are the key vendors in this market space?What are the market opportunities and threats faced by the key vendors?What are the strengths and weaknesses of the key vendors?You can request one free hour of our analyst’s time when you purchase this market report. Details are provided within the report.

  10. IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Article Information

    The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

    Please do cite the aforementioned article when using this dataset.

    Abstract

    The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

    ZIP Folder Content

    The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

    To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

    This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

    Datasets' Content

    Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

    Identified Key Features Within Bluetooth Dataset

    FeatureMeaning
    btle.advertising_headerBLE Advertising Packet Header
    btle.advertising_header.ch_selBLE Advertising Channel Selection Algorithm
    btle.advertising_header.lengthBLE Advertising Length
    btle.advertising_header.pdu_typeBLE Advertising PDU Type
    btle.advertising_header.randomized_rxBLE Advertising Rx Address
    btle.advertising_header.randomized_txBLE Advertising Tx Address
    btle.advertising_header.rfu.1Reserved For Future 1
    btle.advertising_header.rfu.2Reserved For Future 2
    btle.advertising_header.rfu.3Reserved For Future 3
    btle.advertising_header.rfu.4Reserved For Future 4
    btle.control.instantInstant Value Within a BLE Control Packet
    btle.crc.incorrectIncorrect CRC
    btle.extended_advertisingAdvertiser Data Information
    btle.extended_advertising.didAdvertiser Data Identifier
    btle.extended_advertising.sidAdvertiser Set Identifier
    btle.lengthBLE Length
    frame.cap_lenFrame Length Stored Into the Capture File
    frame.interface_idInterface ID
    frame.lenFrame Length Wire
    nordic_ble.board_idBoard ID
    nordic_ble.channelChannel Index
    nordic_ble.crcokIndicates if CRC is Correct
    nordic_ble.flagsFlags
    nordic_ble.packet_counterPacket Counter
    nordic_ble.packet_timePacket time (start to end)
    nordic_ble.phyPHY
    nordic_ble.protoverProtocol Version

    Identified Key Features Within IP-Based Packets Dataset

    FeatureMeaning
    http.content_lengthLength of content in an HTTP response
    http.requestHTTP request being made
    http.response.codeSequential number of an HTTP response
    http.response_numberSequential number of an HTTP response
    http.timeTime taken for an HTTP transaction
    tcp.analysis.initial_rttInitial round-trip time for TCP connection
    tcp.connection.finTCP connection termination with a FIN flag
    tcp.connection.synTCP connection initiation with SYN flag
    tcp.connection.synackTCP connection establishment with SYN-ACK flags
    tcp.flags.cwrCongestion Window Reduced flag in TCP
    tcp.flags.ecnExplicit Congestion Notification flag in TCP
    tcp.flags.finFIN flag in TCP
    tcp.flags.nsNonce Sum flag in TCP
    tcp.flags.resReserved flags in TCP
    tcp.flags.synSYN flag in TCP
    tcp.flags.urgUrgent flag in TCP
    tcp.urgent_pointerPointer to urgent data in TCP
    ip.frag_offsetFragment offset in IP packets
    eth.dst.igEthernet destination is in the internal network group
    eth.src.igEthernet source is in the internal network group
    eth.src.lgEthernet source is in the local network group
    eth.src_not_groupEthernet source is not in any network group
    arp.isannouncementIndicates if an ARP message is an announcement

    Identified Key Features Within IP-Based Flows Dataset

    FeatureMeaning
    protoTransport layer protocol of the connection
    serviceIdentification of an application protocol
    orig_bytesOriginator payload bytes
    resp_bytesResponder payload bytes
    historyConnection state history
    orig_pktsOriginator sent packets
    resp_pktsResponder sent packets
    flow_durationLength of the flow in seconds
    fwd_pkts_totForward packets total
    bwd_pkts_totBackward packets total
    fwd_data_pkts_totForward data packets total
    bwd_data_pkts_totBackward data packets total
    fwd_pkts_per_secForward packets per second
    bwd_pkts_per_secBackward packets per second
    flow_pkts_per_secFlow packets per second
    fwd_header_sizeForward header bytes
    bwd_header_sizeBackward header bytes
    fwd_pkts_payloadForward payload bytes
    bwd_pkts_payloadBackward payload bytes
    flow_pkts_payloadFlow payload bytes
    fwd_iatForward inter-arrival time
    bwd_iatBackward inter-arrival time
    flow_iatFlow inter-arrival time
    activeFlow active duration
  11. Mobile App Store ( 7200 apps)

    • kaggle.com
    zip
    Updated Jun 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramanathan Perumal (2018). Mobile App Store ( 7200 apps) [Dataset]. https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
    Explore at:
    zip(5905027 bytes)Available download formats
    Dataset updated
    Jun 10, 2018
    Authors
    Ramanathan Perumal
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Mobile App Statistics (Apple iOS app store)

    The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user.

    With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.

    Interactive full Shiny app can be seen here( https://multiscal.shinyapps.io/appStore/)

    Data collection date (from API); July 2017

    Dimension of the data set; 7197 rows and 16 columns

    Content:

    appleStore.csv

    1. "id" : App ID

    2. "track_name": App Name

    3. "size_bytes": Size (in Bytes)

    4. "currency": Currency Type

    5. "price": Price amount

    6. "rating_count_tot": User Rating counts (for all version)

    7. "rating_count_ver": User Rating counts (for current version)

    8. "user_rating" : Average User Rating value (for all version)

    9. "user_rating_ver": Average User Rating value (for current version)

    10. "ver" : Latest version code

    11. "cont_rating": Content Rating

    12. "prime_genre": Primary Genre

    13. "sup_devices.num": Number of supporting devices

    14. "ipadSc_urls.num": Number of screenshots showed for display

    15. "lang.num": Number of supported languages

    16. "vpp_lic": Vpp Device Based Licensing Enabled

    appleStore_description.csv

    1. id : App ID
    2. track_name: Application name
    3. size_bytes: Memory size (in Bytes)
    4. app_desc: Application description

    Acknowledgements

    The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.

    Inspiration

    1. How does the App details contribute the user ratings?
    2. Try to compare app statistics for different groups?

    Reference: R package From github, with devtools::install_github("ramamet/applestoreR")

    Licence

    Copyright (c) 2018 Ramanathan Perumal

  12. g

    Open Wifi Milan: Daily upload traffic | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Wifi Milan: Daily upload traffic | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_ds922/
    Explore at:
    Area covered
    Milan
    Description

    Indicates for each day and for each zone the amount of data sent to the Internet; value is expressed in bytes (8bits)

  13. Eurecom ElasticMon 5G

    • kaggle.com
    zip
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdo_pros (2023). Eurecom ElasticMon 5G [Dataset]. https://www.kaggle.com/datasets/abdopros/eurcom-network
    Explore at:
    zip(2196304 bytes)Available download formats
    Dataset updated
    Dec 23, 2023
    Authors
    Abdo_pros
    Description

    Eurecom ElasticMon 5G Dataset

    This dataset, sourced from the Eurecom ElasticMon 5G monitoring framework, includes a range of metrics that are pivotal for analyzing the performance of 4G and 5G Radio Access Networks (RAN). It covers various aspects of network performance, including signal strength, data transmission volumes, and quality indicators. The data is crucial for developing machine learning models for predictive analysis and optimization of network performance.

    Columns Description:

      date_index: Timestamp or index indicating the date and time of the data record.
      rsrp (Reference Signal Received Power): Measures the power level of the signal received by the UE (User Equipment).
      rsrq (Reference Signal Received Quality): Indicates the quality of the received reference signal.
    wbcqi (Wideband Channel Quality Indicator): Provides information about the quality of the downlink channel.
    macStats_phr (MAC layer Power HeadRoom): Indicates the available power capacity of the UE.
    dlCqiReport_sfnSn (Downlink CQI Report with SFN and SN): Downlink Channel Quality Indicator with System Frame Number and Subframe Number.
    macStats_totalBytesSdusDl: Total number of bytes for Service Data Units on the Downlink at the MAC layer.
    macStats_totalTbsUl: Total Transport Block Size for Uplink.
    macStats_mcs1Ul: Modulation and Coding Scheme for the first transport block in Uplink.
    macStats_totalPduDl: Total number of Protocol Data Units in Downlink.
    macStats_totalBytesSdusUl: Total number of bytes for Service Data Units on the Uplink at the MAC layer.
    macStats_tbsDl: Transport Block Size for Downlink.
    macStats_totalPrbUl: Total Physical Resource Blocks used in Uplink.
    macStats_macSdusDl_sduLength: Length of the Service Data Unit in the Downlink.
    macStats_macSdusDl_lcid: Logical Channel ID for Downlink.
    macStats_prbUl: Physical Resource Blocks used in Uplink.
    macStats_totalPduUl: Total number of Protocol Data Units in Uplink.
    macStats_mcs1Dl: Modulation and Coding Scheme for the first transport block in Downlink.
    macStats_mcs2Dl: Modulation and Coding Scheme for the second transport block in Downlink.
    macStats_prbDl: Physical Resource Blocks used in Downlink.
    macStats_totalPrbDl: Total Physical Resource Blocks used in Downlink.
    macStats_prbRetxDl: Physical Resource Blocks used for retransmissions in Downlink.
    macStats_totalTbsDl: Total Transport Block Size for Downlink.
    ulCqiReport_sfnSn (Uplink CQI Report with SFN and SN): Uplink Channel Quality Indicator with System Frame Number and Subframe Number.
    pdcpStats_pktRx: Number of PDCP packets received.
    pdcpStats_pktRxW: PDCP packets received with waiting.
    pdcpStats_pktRxAiatW: Average Inter Arrival Time for PDCP packets received with waiting.
    pdcpStats_pktRxOo: PDCP packets received out of order.
    pdcpStats_pktRxBytesW: Bytes of PDCP packets received with waiting.
    pdcpStats_pktRxSn: Sequence number of the last PDCP packet received.
    pdcpStats_pktTxBytesW: Bytes of PDCP packets transmitted with waiting.
    pdcpStats_pktTxSn: Sequence number of the last PDCP packet transmitted.
    pdcpStats_pktTxBytes: Bytes of PDCP packets transmitted.
    pdcpStats_pktRxAiat: Average Inter Arrival Time for PDCP packets received.
    pdcpStats_pktRxBytes: Bytes of PDCP packets received.
    pdcpStats_pktTx: Number of PDCP packets transmitted.
    pdcpStats_pktTxW: PDCP packets transmitted with waiting.
    pdcpStats_pktTxAiatW: Average Inter Arrival Time for PDCP packets
    
  14. Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    FIT Czech Technical University in Prague
    Authors
    Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

    W-2022-44

    Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

    Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

    Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

    Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

    Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

    Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

    ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

    Link to other CESNET datasets

    https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

    @article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }

  15. Cybersecurity: Suspicious Web Threat Interactions

    • kaggle.com
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JanCSG (2024). Cybersecurity: Suspicious Web Threat Interactions [Dataset]. https://www.kaggle.com/datasets/jancsg/cybersecurity-suspicious-web-threat-interactions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JanCSG
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset contains web traffic records collected through AWS CloudWatch, aimed at detecting suspicious activities and potential attack attempts.

    The data were generated by monitoring traffic to a production web server, using various detection rules to identify anomalous patterns.

    Context

    In today's cloud environments, cybersecurity is more crucial than ever. The ability to detect and respond to threats in real time can protect organizations from significant consequences. This dataset provides a view of web traffic that has been labeled as suspicious, offering a valuable resource for developers, data scientists, and security experts to enhance threat detection techniques.

    Dataset Content

    Each entry in the dataset represents a stream of traffic to a web server, including the following columns:

    bytes_in: Bytes received by the server.

    bytes_out: Bytes sent from the server.

    creation_time: Timestamp of when the record was created.

    end_time: Timestamp of when the connection ended.

    src_ip: Source IP address.

    src_ip_country_code: Country code of the source IP.

    protocol: Protocol used in the connection.

    response.code: HTTP response code.

    dst_port: Destination port on the server.

    dst_ip: Destination IP address.

    rule_names: Name of the rule that identified the traffic as suspicious.

    observation_name: Observations associated with the traffic.

    source.meta: Metadata related to the source.

    source.name: Name of the traffic source.

    time: Timestamp of the detected event.

    detection_types: Type of detection applied.

    Potential Uses

    This dataset is ideal for:

    • Anomaly Detection: Developing models to detect unusual behaviors in web traffic.
    • Classification Models: Training models to automatically classify traffic as normal or suspicious.
    • Security Analysis: Conducting security analyses to understand the tactics, techniques, and procedures of attackers.
  16. CIC-Darknet2020 Internet Traffic

    • kaggle.com
    zip
    Updated Sep 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Friedrich (2020). CIC-Darknet2020 Internet Traffic [Dataset]. https://www.kaggle.com/peterfriedrich1/cicdarknet2020-internet-traffic
    Explore at:
    zip(16730787 bytes)Available download formats
    Dataset updated
    Sep 25, 2020
    Authors
    Peter Friedrich
    Description

    Context

    Original dataset page, license, context, and description at link below: https://www.unb.ca/cic/datasets/darknet2020.html

    This is a dataset gathered to test novel methods for classifying darknet traffic. Dataset gathered by the Canadian Institute for Cybersecurity at the University of New Brunswick.

    Content

    Each unique sample has a flow id. Additional columns include:

    Src IP: Source IP Address Src Port: Source Port Dst IP: Destination IP Address Dst Port: Destination Port Protocol: Internet Protocol Version Timestamp: Timestamp for when traffic was sent Flow Duration: Duration Total Fwd Packet: Total number of packets from source to destination Total Bwd packets: Total Length of Fwd Packet Total Length of Bwd Packet Fwd Packet Length Max Fwd Packet Length Min Fwd Packet Length Mean Fwd Packet Length Std Bwd Packet Length Max Bwd Packet Length Min Bwd Packet Length Mean Bwd Packet Length Std Flow Bytes/s Flow Packets/s Flow IAT Mean Flow IAT Std Flow IAT Max Flow IAT Min Fwd IAT Total Fwd IAT Mean Fwd IAT Std Fwd IAT Max Fwd IAT Min Bwd IAT Total Bwd IAT Mean Bwd IAT Std Bwd IAT Max Bwd IAT Min Fwd PSH Flags Bwd PSH Flags Fwd URG Flags Bwd URG Flags Fwd Header Length Bwd Header Length Fwd Packets/s Bwd Packets/s Packet Length Min Packet Length Max Packet Length Mean Packet Length Std Packet Length Variance FIN Flag Count SYN Flag Count RST Flag Count PSH Flag Count ACK Flag Count URG Flag Count CWE Flag Count ECE Flag Count Down/Up Ratio Average Packet Size Fwd Segment Size Avg Bwd Segment Size Avg Fwd Bytes/Bulk Avg Fwd Packet/Bulk Avg Fwd Bulk Rate Avg Bwd Bytes/Bulk Avg Bwd Packet/Bulk Avg Bwd Bulk Rate Avg Subflow Fwd Packets Subflow Fwd Bytes Subflow Bwd Packets Subflow Bwd Bytes FWD Init Win Bytes Bwd Init Win Bytes Fwd Act Data Pkts Fwd Seg Size Min Active Mean Active Std Active Max Active Min Idle Mean Idle Std Idle Max Idle Min Label Label.1

    Acknowledgements

    • Canadian Institute for Cyber Security
    • University of New Brunswick
    • Kaggle

    Original Paper: Arash Habibi Lashkari, Gurdip Kaur, and Abir Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning”, 10th International Conference on Communication and Network Security, Tokyo, Japan, November 2020

    Inspiration

    Wanting to better understand how Darknet routing works, and how to examine the traffic that goes through it.

  17. C

    Open Wifi Milan: Daily upload traffic

    • ckan.mobidatalab.eu
    csv, json
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Direzione Innovazione Tecnologica e Digitale (2023). Open Wifi Milan: Daily upload traffic [Dataset]. https://ckan.mobidatalab.eu/hu/dataset/ds922-openwifimilano-sessionuploadtraffic
    Explore at:
    json(1187541), csv(687555)Available download formats
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Direzione Innovazione Tecnologica e Digitale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Milan
    Description

    Indicates for each day and for each area the amount of data sent to the Internet; the value is expressed in bytes (8bit)

  18. e

    Internet traffic data for different frame size ranges

    • azon.e-science.pl
    • zasobynauki.pl
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak (2020). Internet traffic data for different frame size ranges [Dataset]. https://azon.e-science.pl/zasoby/internet-traffic-data-for-different-frame-size-ranges,56566/
    Explore at:
    Dataset updated
    2020
    Authors
    Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This resource includes input data used in the work "Machine-Learning Based Prediction of Multiple Types of Network Traffic" by Aleksandra Knapińska, Piotr Lechowicz, and Krzysztof Walkowiak; published in International Conference on Computational Science (ICCS) 2021, Lecture Notes in Computer Science, vol 12742. pp. 122-136. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_12 The work was supported by the National Science Centre, Poland, under Grant 2019/35/B/ST7/04272. Both seattle_november.xml and seattle_december.xml files include internet traffic data from Seattle Internet Exchange Point. The european.xml file includes internet traffic data from one of the European Internet Exchange Points. Each file includes the traffic volume decomposed into specific frame size ranges. Each file starts with a metadata section providing general information. The period covered by a specific file is indicated by its 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, and the European IXP is located in the CET timezone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes in all three files. Each file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.

  19. e

    Internet traffic data from Seattle Internet Exchange Point for different...

    • azon.e-science.pl
    • zasobynauki.pl
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak; Weronika Węgier (2021). Internet traffic data from Seattle Internet Exchange Point for different frame size ranges (2021) [Dataset]. https://azon.e-science.pl/zasoby/internet-traffic-data-from-seattle-internet-exchange-point-for-different-frame-size-ranges-2021,67873/
    Explore at:
    Dataset updated
    2021
    Authors
    Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak; Weronika Węgier
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This resource includes input data used in the work "Long-term prediction of multiple types of time-varying network traffic using chunk-based ensemble learning" by Aleksandra Knapińska, Piotr Lechowicz, Weronika Węgier, and Krzysztof Walkowiak. The work was supported by the National Science Centre, Poland, under Grants 2019/35/B/ST7/04272, 2018/31/D/ST6/0304, and 2019/35/B/ST6/04442.
    The SIX2021.xml file includes internet traffic data from the Seattle Internet Exchange Point collected for one year. The file contains information about the traffic volume decomposed into specific frame size ranges. It starts with a metadata section providing general information. The covered period is indicated by the 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes. The file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.

  20. Top Movies of 2017

    • kaggle.com
    zip
    Updated Jan 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak (2018). Top Movies of 2017 [Dataset]. https://www.kaggle.com/dmail44/top-movies-of-2017
    Explore at:
    zip(5819 bytes)Available download formats
    Dataset updated
    Jan 10, 2018
    Authors
    Deepak
    Description

    Whats this about

    Every year people start to summarize the year passed. Movie lovers look for top 10 best movies from top websites and their favorite youtubers. This end of the year (2017) is no different. Here is a compilation of Lists containing Top 10 movies of 2017 from the internet.

    How is the data collected

    The data is collected from various sources on the internet. There are two files - Top 10 Movies 2017.csv - This the list of Movie Names Ranked from 10 to 1 and the Source from where they were collected. Some Lists are not Ranked which is mentioned in the column Ordered. - IMBD Links.csv - This file contains movie names and their associated IMDB link.

    Where is the data from

    The data is collected from different websites and youtube by using the search keyword "top 10 movies of 2017". All the sources are mentioned in the data file "Top 10 movies 2017.csv" in the "url" column. Movies are only included if the source has only 10 movies in their list, lists containing more than 10 movies are ignored.

    Extra bytes

    Metacritic has collected different lists and ranked based on those list. This data is not collected from metacritic, however there may be some overlapings http://www.metacritic.com/feature/film-critics-list-the-top-10-movies-of-2017

    What can we do

    • Can data science techniques find the real reasons for a movie to be in top 10 list.
    • Does only big budget or famous actors or famous movie crew push the movie to top position
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Organization logo

Data generation volume worldwide 2010-2029

Explore at:
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

Search
Clear search
Close search
Google apps
Main menu