27 datasets found

Data generation volume worldwide 2010-2029
statista.com
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
T
INTERNATIONAL INTERNET BANDWIDTH BITS PER by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2021). INTERNATIONAL INTERNET BANDWIDTH BITS PER by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/international-internet-bandwidth-bits-per-
Explore at:
csv, xml, excel, jsonAvailable download formats
Dataset updated
Dec 20, 2021
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for INTERNATIONAL INTERNET BANDWIDTH BITS PER reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

Data from: Internet Firewall

kaggle.com

zip

Updated Aug 29, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Gary (2020). Internet Firewall [Dataset]. https://www.kaggle.com/sgd825344491/internet-firewall

Explore at:

zip(772612 bytes)Available download formats

Dataset updated

Aug 29, 2020

Authors

Gary

Description

Abstract
Data Set Characteristics:	Multivariate	Number of Instances:	65532	Area:	Computer
Attribute Characteristics:	N/A	Number of Attributes:	12	Date Donated	2019-02-04
Associated Tasks:	Classification	Missing Values?	N/A	Number of Web Hits:	701

Source:

Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.

Data Set Information:

There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.

Attribute Information:

Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received

Relevant Papers:

F. Ertam and M. Kaya, â€œClassification of firewall log files with multiclass support vector machine,â€ in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.

Internet of Things Network Traffic
kaggle.com
zip
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fadel Achmad Daniswara (2025). Internet of Things Network Traffic [Dataset]. https://www.kaggle.com/datasets/fadelachmaddaniswara/internet-of-things-network-traffic
Explore at:
zip(675457 bytes)Available download formats
Dataset updated
May 22, 2025
Authors
Fadel Achmad Daniswara
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Internet of Things Network Traffic

Description:

This dataset contains traffic data collected from an Internet of Things (IoT) network using ESP32 microcontrollers and a Raspberry Pi acting as a gateway. The goal is to monitor and forecast various network performance parameters in an IoT environment using time series models, particularly ARIMA. Each ESP32 device collects environmental and network performance data over time and sends it to a centralized Raspberry Pi gateway. The data was gathered over a 24-hour period and exported into CSV format for further analysis and modeling.

Columns:

timestamp: The date and time of the data collection.

temperature: Temperature readings in degrees Celsius.

humidity(%): Humidity percentage from DHT sensor.

latency(ms): Network latency in milliseconds.

rssi(dBm): Received Signal Strength Indicator in dBm.

packet_loss(%): Estimated packet loss in percentage.

throughput(bytes/sec): Throughput in bytes per second.

Use Cases:

Time series forecasting (ARIMA, SARIMA, LSTM)

IoT network performance analysis

Anomaly detection in traffic

Edge computing and predictive maintenance experiments

Devices:

ESP32 A & ESP32 B (clients)

Raspberry Pi (gateway)
Data from: Revealing QoE of Web Users from Encrypted Network Traffic
figshare.com
zip
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi (2020). Revealing QoE of Web Users from Encrypted Network Traffic [Dataset]. http://doi.org/10.6084/m9.figshare.12459293.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12459293.v1
Dataset updated
Jun 16, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
Data from: Internet Firewall Data Set
kaggle.com
zip
Updated Apr 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Tunguz (2021). Internet Firewall Data Set [Dataset]. https://www.kaggle.com/tunguz/internet-firewall-data-set
Explore at:
zip(772604 bytes)Available download formats
Dataset updated
Apr 14, 2021
Authors
Bojan Tunguz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Source:

Fatih Ertam, fatih.ertam '@' firat.edu.tr, Firat University, Turkey.

Data Set Information:

There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are allow, action, drop and reset-both classes.

Attribute Information:

Source Port,Destination Port,NAT Source Port,NAT Destination Port,Action,Bytes,Bytes Sent,Bytes Received,Packets,Elapsed Time (sec),pkts_sent,pkts_received

Relevant Papers:

F. Ertam and M. Kaya, â€œClassification of firewall log files with multiclass support vector machine,â€ in 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018.
P
Physical Internet Market Report
promarketreports.com
doc, pdf, ppt
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Physical Internet Market Report [Dataset]. https://www.promarketreports.com/reports/physical-internet-market-8586
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Aug 20, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Physical Internet ecosystem encompasses a range of interconnected components working in synergy: Logistic Nodes: These are strategically located physical facilities, acting as hubs for storage, handling, consolidation, and distribution of goods. They are designed for efficient material flow and optimized throughput. Logistic Network: This encompasses the comprehensive infrastructure connecting these nodes, including diverse transportation modes (road, rail, sea, air), communication networks, and advanced information systems ensuring seamless data flow and real-time visibility. Solutions: Software and hardware technologies, including Warehouse Management Systems (WMS), Transportation Management Systems (TMS), and advanced analytics platforms, enable the integration and optimization of logistical processes, driving efficiency and reducing operational costs. Services: A wide array of value-added services are offered, such as inventory management, cross-docking, last-mile delivery solutions, customs brokerage, and reverse logistics, enhancing overall supply chain agility and responsiveness. Recent developments include: Amazon.com Inc., for example, The physical internet is about to get a lot more involved with an effort to build a network where boxes are bytes travelling through the supply chain network in the same way that data travels on the internet. Amazon wants to vertically integrate its logistics., In order to provide a holistic approach for logistics and supply chain management invention research, innovation, and market deployment in Europe, the European Technology Platform (ETP) Alliance for Logistics Innovation via Collaboration in Europe (ALICE) was founded. . Key drivers for this market are: .4. Developing Interconnectivity, . Internet of Things (IoT) Integral Towards Revolutionizing Logistics Paradigm. Potential restraints include: . Need for Mental Shift Towards Physical Internet, . Restraint Impact Analysis.
Corporate network dataset
kaggle.com
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Fhelipe Ribeiro (2025). Corporate network dataset [Dataset]. https://www.kaggle.com/datasets/luisfheliperibeiro/corporate-network-dataset
Explore at:
zip(116752218 bytes)Available download formats
Dataset updated
Apr 25, 2025
Authors
Luis Fhelipe Ribeiro
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
General Description

This dataset was developed from real data on the usage of the corporate data network at the Universidade Federal do Rio Grande do Norte (UFRN). The main objective is to enable detailed observation of the university's network infrastructure and make this data available to the academic community. Data collection started on August 30, 2023, with the last query conducted on February 7, 2025, covering a total of approximately 19 months of continuous observations. During this period, about 1.5 months of data were lost due to failures in the data collection process or maintenance of the system responsible for capturing the data.

The data collections cover administrative, academic, and classroom sectors, spanning a total of 13 buildings within the university, providing a broad view of the network across different environments.

The dataset contains a total of 1,675,843 entries, each with 49 attributes.

Dataset Attributes, by Category

1. Connected Machines and ARP (8 attributes)

Number of Access, Wi-Fi, Security, and VoIP Machines: Indicates the number of machines connected to each type of network, providing insight into the network size and device load.

ARP Value for Access, Wi-Fi, Security, and VoIP: Refers to the number of entries in the Address Resolution Protocol (ARP) table associated with each type of network. ARP is used to map IP addresses to MAC addresses and can indicate potential connectivity issues.

2. Traffic Metrics (18 attributes)

Packet and Byte: Indicates whether the information queried is accounted in packets or transmitted bytes, with positive (1) or negative (-1) values.

Downlink and Uplink Bandwidth by Packets (Access, Wi-Fi, Security, VoIP): Refers to the number of packets received or sent by devices connected to each network type.

Downlink and Uplink Bandwidth by Bytes (Access, Wi-Fi, Security, VoIP): Refers to the number of bytes received or sent by devices connected to each network type.

3. Collection Context (5 attributes)

Sector: The sector from which the data was collected (academic, administrative, or classroom).

Date: The date of the data collection.

Time of Day: The time period of the collection (morning, afternoon, or evening).

Day of the Week: The day of the week when the collection occurred.

Hour: The hour of the collection.

4. Asset Identification (4 attributes)

Asset IP: The IP address of the monitored device.

Asset Model: The model of the network device.

Asset Part Number: The part number of the device.

Asset Firmware: The firmware version in use on the device.

5. Asset Performance (6 attributes)

CPU Usage (% - 1 min and 5 min): The percentage of CPU usage on the device in the last minute and the last five minutes.

Memory Used (%): The percentage of memory used by the device.

Total and Used Memory (Kb): The total amount and the used amount of memory on the device, measured in Kb.

Temperature (°C): The temperature of the device in degrees Celsius.

6. Port Packet Metrics (8 attributes)

Packet In and Out Counter: The number of packets of data that have entered and exited all the device's ports.

Broadcast Packet In and Out Counter: The number of broadcast packets that have entered and exited all the device's ports.

Multicast Packet In and Out Counter: The number of multicast packets that have entered and exited all the device's ports.

Packet Error In and Out Counter: The number of error packets that have entered and exited all the device's ports.

Size and Format

The dataset contains approximately 1,675,843 entries, with 49 attributes per entry. It is available in CSV format.
Global SME Big Data Market 2014-2018
technavio.com
pdf
Updated May 30, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2014). Global SME Big Data Market 2014-2018 [Dataset]. https://www.technavio.com/report/global-sme-big-data-market-2014-2018
Explore at:
pdfAvailable download formats
Dataset updated
May 30, 2014
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Description
Snapshot img { margin: 10px !important; } About SME Big Data Big data solutions include a wide range of hardware, software, and services required for processing and analyzing structured and unstructured data that is too big for traditional data processing tools to manage. These data are generated by various sources such as mobile devices, digital repositories, and enterprise applications and range in size from terabytes (10^12 bytes) to petabytes (10^15 bytes) and even exabytes (10^18 bytes). Due to the considerably large size of big data, it is difficult for SMEs to manage and analyze the data using existing traditional data processing tools. Big data solutions are being used for a wide range of applications such as conversation analysis in social networking websites, fraud management in the BFSI sector, and disease diagnosis in the Healthcare sector. Due to the increasing need for big data solutions, the Global SME Big Data market is expected to witness rapid growth during the forecast period. TechNavio's analysts forecast the Global SME Big Data market will grow at a CAGR of 42.94 percent over the period 2013-2018.Covered in this Report This report covers the present scenario and the growth prospects of the Global SME Big Data market for the period 2014-2018. To calculate the market size, the report considers revenue generated from sales ofHardware: Big data storage, servers, and networking componentsSoftware applications: Apache Hadoop, NoSQL, Cassandra, and other big data software applicationsServices: Big data analytics and consulting, implementation, support, and professional servicesTechNavio's report, the Global SME Big Data Market 2014-2018, has been prepared based on an in-depth market analysis with inputs from industry experts. The report covers the APAC, the EMEA, and the Americas regions; it also covers the Global SME Big Data market landscape and its growth prospects in the coming years. The report also includes a discussion of the key vendors operating in this market.Key RegionsAmericas APACEMEAKey VendorsHewlett-Packard Co.IBM Corp.Oracle Corp.Teradata Corp.Other Prominent VendorsAmazon Web Services, Inc.Cloudera, Inc.Couchbase Inc.EMC Corp.Google Inc.Microsoft Corp.SAP AGSplunk Inc.Key Market DriverIncreasing Need to Improve Business Processes Efficiency.For a full, detailed list, view our report.Key Market ChallengeLack of Awareness among SMEs about Potential of Big Data Solutions.For a full, detailed list, view our report.Key Market TrendIncreasing Market Consolidation.For a full, detailed list, view our report.Key Questions Answered in this ReportWhat will the market size be in 2018 and what will the growth rate be?What are the key market trends?What is driving this market?What are the challenges to market growth?Who are the key vendors in this market space?What are the market opportunities and threats faced by the key vendors?What are the strengths and weaknesses of the key vendors?You can request one free hour of our analyst’s time when you purchase this market report. Details are provided within the report.

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

zenodo.org
data.niaid.nih.gov
+1more

Updated Aug 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.8116338

Dataset updated

Aug 30, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Article Information

The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

Please do cite the aforementioned article when using this dataset.

Abstract

The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

ZIP Folder Content

The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

Datasets' Content

Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

Identified Key Features Within Bluetooth Dataset

Feature	Meaning
btle.advertising_header	BLE Advertising Packet Header
btle.advertising_header.ch_sel	BLE Advertising Channel Selection Algorithm
btle.advertising_header.length	BLE Advertising Length
btle.advertising_header.pdu_type	BLE Advertising PDU Type
btle.advertising_header.randomized_rx	BLE Advertising Rx Address
btle.advertising_header.randomized_tx	BLE Advertising Tx Address
btle.advertising_header.rfu.1	Reserved For Future 1
btle.advertising_header.rfu.2	Reserved For Future 2
btle.advertising_header.rfu.3	Reserved For Future 3
btle.advertising_header.rfu.4	Reserved For Future 4
btle.control.instant	Instant Value Within a BLE Control Packet
btle.crc.incorrect	Incorrect CRC
btle.extended_advertising	Advertiser Data Information
btle.extended_advertising.did	Advertiser Data Identifier
btle.extended_advertising.sid	Advertiser Set Identifier
btle.length	BLE Length
frame.cap_len	Frame Length Stored Into the Capture File
frame.interface_id	Interface ID
frame.len	Frame Length Wire
nordic_ble.board_id	Board ID
nordic_ble.channel	Channel Index
nordic_ble.crcok	Indicates if CRC is Correct
nordic_ble.flags	Flags
nordic_ble.packet_counter	Packet Counter
nordic_ble.packet_time	Packet time (start to end)
nordic_ble.phy	PHY
nordic_ble.protover	Protocol Version

Identified Key Features Within IP-Based Packets Dataset

Feature	Meaning
http.content_length	Length of content in an HTTP response
http.request	HTTP request being made
http.response.code	Sequential number of an HTTP response
http.response_number	Sequential number of an HTTP response
http.time	Time taken for an HTTP transaction
tcp.analysis.initial_rtt	Initial round-trip time for TCP connection
tcp.connection.fin	TCP connection termination with a FIN flag
tcp.connection.syn	TCP connection initiation with SYN flag
tcp.connection.synack	TCP connection establishment with SYN-ACK flags
tcp.flags.cwr	Congestion Window Reduced flag in TCP
tcp.flags.ecn	Explicit Congestion Notification flag in TCP
tcp.flags.fin	FIN flag in TCP
tcp.flags.ns	Nonce Sum flag in TCP
tcp.flags.res	Reserved flags in TCP
tcp.flags.syn	SYN flag in TCP
tcp.flags.urg	Urgent flag in TCP
tcp.urgent_pointer	Pointer to urgent data in TCP
ip.frag_offset	Fragment offset in IP packets
eth.dst.ig	Ethernet destination is in the internal network group
eth.src.ig	Ethernet source is in the internal network group
eth.src.lg	Ethernet source is in the local network group
eth.src_not_group	Ethernet source is not in any network group
arp.isannouncement	Indicates if an ARP message is an announcement

Identified Key Features Within IP-Based Flows Dataset

Feature	Meaning
proto	Transport layer protocol of the connection
service	Identification of an application protocol
orig_bytes	Originator payload bytes
resp_bytes	Responder payload bytes
history	Connection state history
orig_pkts	Originator sent packets
resp_pkts	Responder sent packets
flow_duration	Length of the flow in seconds
fwd_pkts_tot	Forward packets total
bwd_pkts_tot	Backward packets total
fwd_data_pkts_tot	Forward data packets total
bwd_data_pkts_tot	Backward data packets total
fwd_pkts_per_sec	Forward packets per second
bwd_pkts_per_sec	Backward packets per second
flow_pkts_per_sec	Flow packets per second
fwd_header_size	Forward header bytes
bwd_header_size	Backward header bytes
fwd_pkts_payload	Forward payload bytes
bwd_pkts_payload	Backward payload bytes
flow_pkts_payload	Flow payload bytes
fwd_iat	Forward inter-arrival time
bwd_iat	Backward inter-arrival time
flow_iat	Flow inter-arrival time
active	Flow active duration

Mobile App Store ( 7200 apps)
kaggle.com
zip
Updated Jun 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramanathan Perumal (2018). Mobile App Store ( 7200 apps) [Dataset]. https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
Explore at:
zip(5905027 bytes)Available download formats
Dataset updated
Jun 10, 2018
Authors
Ramanathan Perumal
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Mobile App Statistics (Apple iOS app store)

The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user.

With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.

Interactive full Shiny app can be seen here( https://multiscal.shinyapps.io/appStore/)

Data collection date (from API); July 2017

Dimension of the data set; 7197 rows and 16 columns

Content:

appleStore.csv

"id" : App ID

"track_name": App Name

"size_bytes": Size (in Bytes)

"currency": Currency Type

"price": Price amount

"rating_count_tot": User Rating counts (for all version)

"rating_count_ver": User Rating counts (for current version)

"user_rating" : Average User Rating value (for all version)

"user_rating_ver": Average User Rating value (for current version)

"ver" : Latest version code

"cont_rating": Content Rating

"prime_genre": Primary Genre

"sup_devices.num": Number of supporting devices

"ipadSc_urls.num": Number of screenshots showed for display

"lang.num": Number of supported languages

"vpp_lic": Vpp Device Based Licensing Enabled

appleStore_description.csv

id : App ID

track_name: Application name

size_bytes: Memory size (in Bytes)

app_desc: Application description

Acknowledgements

The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.

Inspiration

How does the App details contribute the user ratings?

Try to compare app statistics for different groups?

Reference: R package From github, with devtools::install_github("ramamet/applestoreR")

Licence

Copyright (c) 2018 Ramanathan Perumal
g
Open Wifi Milan: Daily upload traffic | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Wifi Milan: Daily upload traffic | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_ds922/
Explore at:
Area covered
Milan
Description
Indicates for each day and for each zone the amount of data sent to the Internet; value is expressed in bytes (8bits)

Eurecom ElasticMon 5G

kaggle.com

zip

Updated Dec 23, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Abdo_pros (2023). Eurecom ElasticMon 5G [Dataset]. https://www.kaggle.com/datasets/abdopros/eurcom-network

Explore at:

zip(2196304 bytes)Available download formats

Dataset updated

Dec 23, 2023

Authors

Abdo_pros

Description

Eurecom ElasticMon 5G Dataset

This dataset, sourced from the Eurecom ElasticMon 5G monitoring framework, includes a range of metrics that are pivotal for analyzing the performance of 4G and 5G Radio Access Networks (RAN). It covers various aspects of network performance, including signal strength, data transmission volumes, and quality indicators. The data is crucial for developing machine learning models for predictive analysis and optimization of network performance.

Columns Description:

  date_index: Timestamp or index indicating the date and time of the data record.
  rsrp (Reference Signal Received Power): Measures the power level of the signal received by the UE (User Equipment).
  rsrq (Reference Signal Received Quality): Indicates the quality of the received reference signal.
wbcqi (Wideband Channel Quality Indicator): Provides information about the quality of the downlink channel.
macStats_phr (MAC layer Power HeadRoom): Indicates the available power capacity of the UE.
dlCqiReport_sfnSn (Downlink CQI Report with SFN and SN): Downlink Channel Quality Indicator with System Frame Number and Subframe Number.
macStats_totalBytesSdusDl: Total number of bytes for Service Data Units on the Downlink at the MAC layer.
macStats_totalTbsUl: Total Transport Block Size for Uplink.
macStats_mcs1Ul: Modulation and Coding Scheme for the first transport block in Uplink.
macStats_totalPduDl: Total number of Protocol Data Units in Downlink.
macStats_totalBytesSdusUl: Total number of bytes for Service Data Units on the Uplink at the MAC layer.
macStats_tbsDl: Transport Block Size for Downlink.
macStats_totalPrbUl: Total Physical Resource Blocks used in Uplink.
macStats_macSdusDl_sduLength: Length of the Service Data Unit in the Downlink.
macStats_macSdusDl_lcid: Logical Channel ID for Downlink.
macStats_prbUl: Physical Resource Blocks used in Uplink.
macStats_totalPduUl: Total number of Protocol Data Units in Uplink.
macStats_mcs1Dl: Modulation and Coding Scheme for the first transport block in Downlink.
macStats_mcs2Dl: Modulation and Coding Scheme for the second transport block in Downlink.
macStats_prbDl: Physical Resource Blocks used in Downlink.
macStats_totalPrbDl: Total Physical Resource Blocks used in Downlink.
macStats_prbRetxDl: Physical Resource Blocks used for retransmissions in Downlink.
macStats_totalTbsDl: Total Transport Block Size for Downlink.
ulCqiReport_sfnSn (Uplink CQI Report with SFN and SN): Uplink Channel Quality Indicator with System Frame Number and Subframe Number.
pdcpStats_pktRx: Number of PDCP packets received.
pdcpStats_pktRxW: PDCP packets received with waiting.
pdcpStats_pktRxAiatW: Average Inter Arrival Time for PDCP packets received with waiting.
pdcpStats_pktRxOo: PDCP packets received out of order.
pdcpStats_pktRxBytesW: Bytes of PDCP packets received with waiting.
pdcpStats_pktRxSn: Sequence number of the last PDCP packet received.
pdcpStats_pktTxBytesW: Bytes of PDCP packets transmitted with waiting.
pdcpStats_pktTxSn: Sequence number of the last PDCP packet transmitted.
pdcpStats_pktTxBytes: Bytes of PDCP packets transmitted.
pdcpStats_pktRxAiat: Average Inter Arrival Time for PDCP packets received.
pdcpStats_pktRxBytes: Bytes of PDCP packets received.
pdcpStats_pktTx: Number of PDCP packets transmitted.
pdcpStats_pktTxW: PDCP packets transmitted with waiting.
pdcpStats_pktTxAiatW: Average Inter Arrival Time for PDCP packets

Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...
data.niaid.nih.gov
zenodo.org
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
Explore at:
Dataset updated
Feb 29, 2024
Dataset provided by
CESNEThttp://www.cesnet.cz/
FIT Czech Technical University in Prague
Authors
Luxemburk, Jan; Hynek, Karel; Čejka, Tomáš; Lukačovič, Andrej; Šiška, Pavel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

W-2022-44

Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

Link to other CESNET datasets

https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
Cybersecurity: Suspicious Web Threat Interactions
kaggle.com
Updated Apr 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JanCSG (2024). Cybersecurity: Suspicious Web Threat Interactions [Dataset]. https://www.kaggle.com/datasets/jancsg/cybersecurity-suspicious-web-threat-interactions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JanCSG
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This dataset contains web traffic records collected through AWS CloudWatch, aimed at detecting suspicious activities and potential attack attempts.

The data were generated by monitoring traffic to a production web server, using various detection rules to identify anomalous patterns.

Context

In today's cloud environments, cybersecurity is more crucial than ever. The ability to detect and respond to threats in real time can protect organizations from significant consequences. This dataset provides a view of web traffic that has been labeled as suspicious, offering a valuable resource for developers, data scientists, and security experts to enhance threat detection techniques.

Dataset Content

Each entry in the dataset represents a stream of traffic to a web server, including the following columns:

bytes_in: Bytes received by the server.

bytes_out: Bytes sent from the server.

creation_time: Timestamp of when the record was created.

end_time: Timestamp of when the connection ended.

src_ip: Source IP address.

src_ip_country_code: Country code of the source IP.

protocol: Protocol used in the connection.

response.code: HTTP response code.

dst_port: Destination port on the server.

dst_ip: Destination IP address.

rule_names: Name of the rule that identified the traffic as suspicious.

observation_name: Observations associated with the traffic.

source.meta: Metadata related to the source.

source.name: Name of the traffic source.

time: Timestamp of the detected event.

detection_types: Type of detection applied.

Potential Uses

This dataset is ideal for:

Anomaly Detection: Developing models to detect unusual behaviors in web traffic.

Classification Models: Training models to automatically classify traffic as normal or suspicious.

Security Analysis: Conducting security analyses to understand the tactics, techniques, and procedures of attackers.
CIC-Darknet2020 Internet Traffic
kaggle.com
zip
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Friedrich (2020). CIC-Darknet2020 Internet Traffic [Dataset]. https://www.kaggle.com/peterfriedrich1/cicdarknet2020-internet-traffic
Explore at:
zip(16730787 bytes)Available download formats
Dataset updated
Sep 25, 2020
Authors
Peter Friedrich
Description
Context

Original dataset page, license, context, and description at link below: https://www.unb.ca/cic/datasets/darknet2020.html

This is a dataset gathered to test novel methods for classifying darknet traffic. Dataset gathered by the Canadian Institute for Cybersecurity at the University of New Brunswick.

Content

Each unique sample has a flow id. Additional columns include:

Src IP: Source IP Address Src Port: Source Port Dst IP: Destination IP Address Dst Port: Destination Port Protocol: Internet Protocol Version Timestamp: Timestamp for when traffic was sent Flow Duration: Duration Total Fwd Packet: Total number of packets from source to destination Total Bwd packets: Total Length of Fwd Packet Total Length of Bwd Packet Fwd Packet Length Max Fwd Packet Length Min Fwd Packet Length Mean Fwd Packet Length Std Bwd Packet Length Max Bwd Packet Length Min Bwd Packet Length Mean Bwd Packet Length Std Flow Bytes/s Flow Packets/s Flow IAT Mean Flow IAT Std Flow IAT Max Flow IAT Min Fwd IAT Total Fwd IAT Mean Fwd IAT Std Fwd IAT Max Fwd IAT Min Bwd IAT Total Bwd IAT Mean Bwd IAT Std Bwd IAT Max Bwd IAT Min Fwd PSH Flags Bwd PSH Flags Fwd URG Flags Bwd URG Flags Fwd Header Length Bwd Header Length Fwd Packets/s Bwd Packets/s Packet Length Min Packet Length Max Packet Length Mean Packet Length Std Packet Length Variance FIN Flag Count SYN Flag Count RST Flag Count PSH Flag Count ACK Flag Count URG Flag Count CWE Flag Count ECE Flag Count Down/Up Ratio Average Packet Size Fwd Segment Size Avg Bwd Segment Size Avg Fwd Bytes/Bulk Avg Fwd Packet/Bulk Avg Fwd Bulk Rate Avg Bwd Bytes/Bulk Avg Bwd Packet/Bulk Avg Bwd Bulk Rate Avg Subflow Fwd Packets Subflow Fwd Bytes Subflow Bwd Packets Subflow Bwd Bytes FWD Init Win Bytes Bwd Init Win Bytes Fwd Act Data Pkts Fwd Seg Size Min Active Mean Active Std Active Max Active Min Idle Mean Idle Std Idle Max Idle Min Label Label.1

Acknowledgements

Canadian Institute for Cyber Security

University of New Brunswick

Kaggle

Original Paper: Arash Habibi Lashkari, Gurdip Kaur, and Abir Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning”, 10th International Conference on Communication and Network Security, Tokyo, Japan, November 2020

Inspiration

Wanting to better understand how Darknet routing works, and how to examine the traffic that goes through it.
C
Open Wifi Milan: Daily upload traffic
ckan.mobidatalab.eu
csv, json
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Direzione Innovazione Tecnologica e Digitale (2023). Open Wifi Milan: Daily upload traffic [Dataset]. https://ckan.mobidatalab.eu/hu/dataset/ds922-openwifimilano-sessionuploadtraffic
Explore at:
json(1187541), csv(687555)Available download formats
Dataset updated
Nov 9, 2023
Dataset provided by
Direzione Innovazione Tecnologica e Digitale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Milan
Description
Indicates for each day and for each area the amount of data sent to the Internet; the value is expressed in bytes (8bit)
e
Internet traffic data for different frame size ranges
azon.e-science.pl
zasobynauki.pl
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak (2020). Internet traffic data for different frame size ranges [Dataset]. https://azon.e-science.pl/zasoby/internet-traffic-data-for-different-frame-size-ranges,56566/
Explore at:
Dataset updated
2020
Authors
Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This resource includes input data used in the work "Machine-Learning Based Prediction of Multiple Types of Network Traffic" by Aleksandra Knapińska, Piotr Lechowicz, and Krzysztof Walkowiak; published in International Conference on Computational Science (ICCS) 2021, Lecture Notes in Computer Science, vol 12742. pp. 122-136. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_12 The work was supported by the National Science Centre, Poland, under Grant 2019/35/B/ST7/04272. Both seattle_november.xml and seattle_december.xml files include internet traffic data from Seattle Internet Exchange Point. The european.xml file includes internet traffic data from one of the European Internet Exchange Points. Each file includes the traffic volume decomposed into specific frame size ranges. Each file starts with a metadata section providing general information. The period covered by a specific file is indicated by its 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, and the European IXP is located in the CET timezone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes in all three files. Each file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.
e
Internet traffic data from Seattle Internet Exchange Point for different...
azon.e-science.pl
zasobynauki.pl
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak; Weronika Węgier (2021). Internet traffic data from Seattle Internet Exchange Point for different frame size ranges (2021) [Dataset]. https://azon.e-science.pl/zasoby/internet-traffic-data-from-seattle-internet-exchange-point-for-different-frame-size-ranges-2021,67873/
Explore at:
Dataset updated
2021
Authors
Aleksandra Knapińska; Piotr Lechowicz; Krzysztof Walkowiak; Weronika Węgier
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This resource includes input data used in the work "Long-term prediction of multiple types of time-varying network traffic using chunk-based ensemble learning" by Aleksandra Knapińska, Piotr Lechowicz, Weronika Węgier, and Krzysztof Walkowiak. The work was supported by the National Science Centre, Poland, under Grants 2019/35/B/ST7/04272, 2018/31/D/ST6/0304, and 2019/35/B/ST6/04442.
The SIX2021.xml file includes internet traffic data from the Seattle Internet Exchange Point collected for one year. The file contains information about the traffic volume decomposed into specific frame size ranges. It starts with a metadata section providing general information. The covered period is indicated by the 'start' and 'end' tags. They provide Unix timestamps in the GMT timezone. It should be noted that Seattle lies in the PST time zone, so the start and end times should be adjusted accordingly. The step parameter is given in seconds, so the samples are stored every 5 minutes. The file has multiple columns providing traffic data in bits per second for different frame size ranges. Column names specify the ranges in bytes. The 'total' column stores information about the total aggregate traffic volume, which is a sum of values in all the remaining columns in each row.
Top Movies of 2017
kaggle.com
zip
Updated Jan 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak (2018). Top Movies of 2017 [Dataset]. https://www.kaggle.com/dmail44/top-movies-of-2017
Explore at:
zip(5819 bytes)Available download formats
Dataset updated
Jan 10, 2018
Authors
Deepak
Description
Whats this about

Every year people start to summarize the year passed. Movie lovers look for top 10 best movies from top websites and their favorite youtubers. This end of the year (2017) is no different. Here is a compilation of Lists containing Top 10 movies of 2017 from the internet.

How is the data collected

The data is collected from various sources on the internet. There are two files - Top 10 Movies 2017.csv - This the list of Movie Names Ranked from 10 to 1 and the Source from where they were collected. Some Lists are not Ranked which is mentioned in the column Ordered. - IMBD Links.csv - This file contains movie names and their associated IMDB link.

Where is the data from

The data is collected from different websites and youtube by using the search keyword "top 10 movies of 2017". All the sources are mentioned in the data file "Top 10 movies 2017.csv" in the "url" column. Movies are only included if the source has only 10 movies in their list, lists containing more than 10 movies are ignored.

Extra bytes

Metacritic has collected different lists and ranked based on those list. This data is not collected from metacritic, however there may be some overlapings http://www.metacritic.com/feature/film-critics-list-the-top-10-movies-of-2017

What can we do

Can data science techniques find the real reasons for a movie to be in top 10 list.

Does only big budget or famous actors or famous movie crew push the movie to top position

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/

Data generation volume worldwide 2010-2029

Explore at:

Dataset updated

Nov 19, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

Worldwide

Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

Clear search

Close search

Google apps

Main menu

Data generation volume worldwide 2010-2029

INTERNATIONAL INTERNET BANDWIDTH BITS PER by Country Dataset