Facebook
TwitterRepresentative applications that can directly collect 5G da-tasets from mobile terminals without using specialized equipment include G-NetTrack Pro and PCAPdroid. The for-mer allows for the monitoring and logging of the header and payload information of the medium access control (MAC) frame passing through the 5G air interface. The latter is an open-source network capture and monitoring tool that works without root privileges, analyzing connections made by ap-plications installed on the user's mobile device. The latter can also dump mobile traffic to PCAP (also known as libpcap) and send it to the well-known Wireshark for further analysis. We created 5G datasets by measuring 5G traffic directly from a major mobile operator in South Korea. The model name of the mobile terminal used for traffic measurement is the Samsung Galaxy A90 5G, and it was equipped with a Qualcomm Snapdragon X50 5G modem. The packet sniffer software used for traffic measurement, PCAPdroid, was in-stalled in the terminal through Google play. Traffic was measured sequentially per application on two stationary ter-minals (only one terminal was used for non-interactive ser-vices) with no background traffic. The collected dataset is representative resource-intensive video traffic that has the greatest impact on 5G network planning and provisioning, and background traffic was not mixed to measure the unique characteristics of each type of traffic. The video streaming dataset includes data directly meas-ured while watching Netflix and Amazon Prime, which are representative over-the-top (OTT) services, on mobile devic-es. The live streaming dataset was measured while watching YouTube Live and South Korea's representative live broad-casts (Naver NOW and Afreeca TV). Video conferencing data were measured by holding an actual meeting on the widely used Zoom, MS Teams, and Google Meet platform. Two types of metaverse traffic were acquired: Zepeto and Roblox. Zepeto traffic was collected while staying in the 'camping world' for 15 hours. Roblox traffic was collected over 25 hours of playing the 'Collect All Pets' game using an auto clicker. We collected two types of mobile network gaming traffic. The first was cloud gaming, an online game setup that runs video games on remote servers and streams them direct-ly to the user's device. The second was a traditional mobile game connected to the Internet. The dataset was collected from May to October 2022, is a massive 328 hours in total, and is provided in the csv file format. The dataset we collected is a timestamp-mapped time series dataset with packet header information, and traffic analysis by application is possible because it includes source and destination addresses. To make it more usable as a traffic source model, Section III describes how to use it as a training dataset for the traffic simulator platform's source generator.
A 5G traffic dataset measured by PCAPdroid has been re-leased and can be used as a training dataset for various ML models. However, since the size of this dataset is very large, it is inconvenient to handle, and additional data preprocessing is required to use it for its intended purpose.
This data set can be used to learn GANs, time-series forcasting deep learning models.
Our implementation is given on GitHub. https://github.com/0913ktg/5G-Traffic-Generator
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verified dataset of 2025 device usage: share of global web traffic, mobile commerce share of transactions, US daily time spent, app vs web breakdown, and tablet decline.
Facebook
TwitterThis dataset encompasses mobile web clickstream behavior on any browser, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or path to purchase and consumer journey understanding. Full URL deliverable available including searches.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides detailed insights and best practices for tracking and measuring local SEO performance across a range of critical metrics, including Google Business Profile engagement, local keyword rankings, website traffic from local searches, citation management, mobile optimization, and ROI calculation. The data is based on expert analysis and recommendations to help local businesses optimize their local search visibility and drive measurable results.
Facebook
TwitterThis dataset encompasses mobile app usage, web clickstream and location visitation behavior, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). The only omnichannel meter at scale representing iOS and Android platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For the evaluation of OS fingerprinting methods, we need a dataset with the following requirements:
To overcome these issues, we have decided to create the dataset from the traffic of several web servers at our university. This allows us to address the first issue by collecting traces from thousands of devices ranging from user computers and mobile phones to web crawlers and other servers. The ground truth values are obtained from the HTTP User-Agent, which resolves the second of the presented issues. Even though most traffic is encrypted, the User-Agent can be recovered from the web server logs that record every connection’s details. By correlating the IP address and timestamp of each log record to the captured traffic, we can add the ground truth to the dataset.
For this dataset, we have selected a cluster of five web servers that host 475 unique university domains for public websites. The monitoring point recording the traffic was placed at the backbone network connecting the university to the Internet.
The dataset used in this paper was collected from approximately 8 hours of university web traffic throughout a single workday. The logs were collected from Microsoft IIS web servers and converted from W3C extended logging format to JSON. The logs are referred to as web logs and are used to annotate the records generated from packet capture obtained by using a network probe tapped into the link to the Internet.
The entire dataset creation process consists of seven steps:
The collected and enriched flows contain 111 data fields that can be used as features for OS fingerprinting or any other data analyses. The fields grouped by their area are listed below:
The details of OS distribution grouped by the OS family are summarized in the table below. The Other OS family contains records generated by web crawling bots that do not include OS information in the User-Agent.
| OS Family | Number of flows |
|---|---|
| Other | 42474 |
| Windows | 40349 |
| Android | 10290 |
| iOS | 8840 |
| Mac OS X | 5324 |
| Linux | 1589 |
| Ubuntu | 653 |
| Fedora | 88 |
| Chrome OS | 53 |
| Symbian OS | 1 |
| Slackware | 1 |
| Linux Mint | 1 |
Facebook
TwitterThis dataset is from the University of New Brunswick Centre for Cybersecurity.
It has extracted CSV features on network traffic across 105 Internet of Things (IoT) devices with 33 cyberattacks run on them. 7 types of attacks were run: distributed denial of service (DDoS), denial of service (DoS), reconnaissance, web-based, brute-force, spoofing, and the Mirai botnet.
To quote the centre's website: "You may redistribute, republish, and mirror our datasets in any form; however, any use or redistribution of the data must include a citation to the dataset and the research paper listed on the webpage."
Citation E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors).
| Feature | Description |
|---|---|
| ts | Timestamp of first packet in flow |
| flow_duration | Time between first and last packet received in flow |
| Header_Length | Length of packet header in bits |
| Protocol Type | Protocol numbers, as defined by the IANA. Ex: 1 = ICMP, 6 = TCP |
| Duration | Time-to-Live (ttl) |
| Rate | Rate of packet transmission in a flow |
| Srate | Rate of outbound (sent) packets transmission in a flow |
| Drate | Rate of inbound (received) packets transmission in a flow |
| fin_flag_number | Fin flag value |
| syn_flag_number | Syn flag value |
| rst_flag_number | Rst flag value |
| psh_flag_numbe | Psh flag value |
| ack_flag_number | Ack flag value |
| ece_flag_number | Ece flag value |
| cwr_flag_number | Cwr flag value |
| ack_count | Number of packets with ack flag set in the same flow |
| syn_count | Number of packets with syn flag set in the same flow |
| fin_count | Number of packets with fin flag set in the same flow |
| urg_count | Number of packets with urg flag set in the same flow |
| rst_count | Number of packets with rst flag set in the same flow |
| HTTP | Indicates if the application layer protocol is HTTP |
| HTTPS | Indicates if the application layer protocol is HTTPS |
| DNS | Indicates if the application layer protocol is DNS |
| Telnet | Indicates if the application layer protocol is Telnet |
| SMTP | Indicates if the application layer protocol is SMTP |
| SSH | Indicates if the application layer protocol is SSH |
| IRC | Indicates if the application layer protocol is IRC |
| TCP | Indicates if the transport layer protocol is TCP |
| UDP | Indicates if the transport layer protocol is UDP |
| DHCP | Indicates if the application layer protocol is DHCP |
| ARP | Indicates if the link layer protocol is ARP |
| ICMP | Indicates if the network layer protocol is ICMP |
| IPv | Indicates if the network layer protocol is IP |
| LLC | Indicates if the link layer protocol is LLC |
| Tot_sum | Summation of packets lengths in flow |
| Min | Minimum packet length in the flow |
| Max | Maximumpacket length in the flow |
| AVG | Average packet length in the flow |
| Std | Standard deviation of packet length in the flow |
| Tot_size | Packet’s length |
| IAT | The time difference with the previous packet |
| Number | The number of packets in the flow |
| Magnitude | sqrt(Average of the lengths of incoming packets in the flow + average of the lengths of outgoing packets in the flow) |
| Radius | sqrt(Variance of the lengths of incoming packets in the flow +variance of the lengths of outgoing packets in the flow) |
| Covariance | Covariance of the lengths of incoming and outgoing packets |
| Variance | Variance of the lengths of incoming packets in the flow/variance of the lengths of outgoing packets in the flow |
| Weight | Number of incoming packets × Number of outgoing packets |
| label | Notes the type of attack being run or 'BenignTraffic' for no attack run |
| Device Name | Category | MAC Address | Device Name | Category | MAC Address |
|---|---|---|---|---|---|
| Amazon Alexa Echo Dot 1 | Audio | 1C:FE:2B:98:16:DD | Lumiman bulb | Lighting | 84:E3... |
Facebook
TwitterComprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.
Facebook
TwitterHow many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
Facebook
TwitterCompany: BrightWave Digital Department: Digital Marketing & SEO Team Industry: E-commerce (fashion and lifestyle products) Brand: UrbanScape Apparel
BrightWave Digital is a fast-growing digital marketing agency that handles full-spectrum SEO, SEM, and content marketing for various clients. The SEO team is tasked with pushing UrbanScape Apparel, a sustainable fashion brand, to the top of the search rankings. The brand sells eco-friendly clothing and accessories aimed at environmentally conscious consumers in North America.
UrbanScape Apparel has recently expanded its product lines and introduced new collections, such as “Urban Outdoors” for hiking gear and “EcoActive” for athleisure. With increased competition in the eco-fashion market, BrightWave Digital’s SEO team must optimize UrbanScape’s site performance, monitor SEO metrics closely, and demonstrate measurable improvements in organic traffic and conversions.
Improve rankings for high-intent keywords like "eco-friendly clothing" and "sustainable outdoor gear." Boost organic traffic from both mobile and desktop devices. Increase visibility through backlinks from high domain authority (DA) sites. Optimize Core Web Vitals to ensure the site ranks higher in Google’s search results. The dashboard data includes traffic, keyword rankings, click-through rates (CTR), and other performance metrics to track how well the SEO efforts are contributing to the brand’s growth.
1. Date Definition: The specific day for which the data is collected. Importance: Allows tracking of daily trends and pinpointing specific dates of spikes or drops in performance.
2. Month Definition: The month corresponding to the data being analyzed. Importance: Helps in understanding monthly trends and seasonal patterns in traffic and user behavior.
3. Year Definition: The year in which the data was recorded. Importance: Essential for long-term trend analysis and year-over-year performance comparisons.
4. Quarter Definition: The fiscal quarter (Q1, Q2, Q3, Q4) for the given data. Importance: Useful for quarterly business reviews and strategy adjustments based on performance.
5. Time Of Day Definition: The specific time range (e.g., morning, afternoon, evening) when the traffic or engagement was recorded. Importance: Helps in understanding peak traffic times and optimizing content publishing schedules.
6. Primary Keywords Definition: The main keywords targeted for SEO, typically with high search volume and relevance to the brand. Importance: Crucial for understanding the focus of the SEO strategy and the effectiveness of ranking for these terms.
7. Secondary Keywords Definition: Additional keywords that complement primary keywords, often with lower competition and specific niches. Importance: Provides insights into secondary areas of focus that can still drive significant traffic and conversions.
8. Long-Tail Keywords Definition: More specific keyword phrases usually consisting of three or more words, targeting niche search queries. Importance: Important for attracting highly targeted traffic and often associated with higher conversion rates.
9. Location Definition: Geographic region from where the traffic is coming. Importance: Helps in understanding regional performance and tailoring content or promotions to specific markets.
10. Social Media Source Definition: The social media platform (e.g., Instagram, Pinterest) from which traffic is referred to the site. Importance: Measures the impact of social media channels on website traffic and engagement.
11. Media Type Definition: The format of the media content (e.g., image, video, article) driving traffic. Importance: Analyzes which media types resonate best with the audience and contribute to higher engagement.
12. Device Type Definition: The type of device used by visitors (e.g., mobile, desktop, tablet) to access the website. Importance: Essential for optimizing user experience across different devices and identifying potential issues.
13. Organic Traffic Definition: The number of visitors coming to the site through unpaid search results. Importance: Shows how well the site is performing in attracting users through SEO efforts without relying on paid advertising.
14. Keywords Ranking Definition: The position of targeted keywords in search engine results pages (SERPs). Importance: Indicates the effectiveness of SEO strategies in improving keyword visibility and competitiveness.
15. Clicks Definition: The number of times users click on the site’s links from search results. Importance: Reflects user interest and relevance of the search snippets or ads shown to users.
16. Impressions Definition: The number of times a site appears in search r...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.
Facebook
TwitterThe global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
TwitterОпределение: Общий трафик на 15 сайтов с искусственным интеллектом со стационарных и мобильных компьютеров в каждой стране. [Переведено с en: английского языка] Тематическая область: Информационно-коммуникационные технологии [Переведено с en: английского языка] Область применения: Искусственный интеллект [Переведено с en: английского языка] Единица измерения: Количество посещений [Переведено с en: английского языка] Примечание: Similarweb не предоставляет точных данных о количестве посещений веб-сайтов, которые посещают менее 5000 человек. В этих случаях используется приблизительная оценка в 4999 посещений. [Переведено с es: испанского языка] Источник данных: Цифровая обсерватория Десарролло (ODD) на основе Similarweb [Переведено с es: испанского языка] Последнее обновление: Feb 9 2024 1:04PM Организация-источник: Экономическая комиссия по Латинской Америке и Карибскому бассейну [Переведено с en: английского языка] Definition: Total traffic to 15 artificial intelligence sites from fixed and mobile computers per country. Thematic Area: Information and Communication Technologies Application Area: Artificial intelligence Unit of Measurement: Number of visits Note: Similarweb does not provide an exact number of visits for websites that receive fewer than 5,000 visits. In these cases, an approximate estimate of 4,999 is used. Data Source: Observatorio de Desarrollo Digital (ODD) based on Similarweb Last Update: Feb 9 2024 1:04PM Source Organization: Economic Comission for Latin America and the Caribbean
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was developed from real data on the usage of the corporate data network at the Universidade Federal do Rio Grande do Norte (UFRN). The main objective is to enable detailed observation of the university's network infrastructure and make this data available to the academic community. Data collection started on August 30, 2023, with the last query conducted on February 7, 2025, covering a total of approximately 19 months of continuous observations. During this period, about 1.5 months of data were lost due to failures in the data collection process or maintenance of the system responsible for capturing the data.
The data collections cover administrative, academic, and classroom sectors, spanning a total of 13 buildings within the university, providing a broad view of the network across different environments.
The dataset contains a total of 1,675,843 entries, each with 49 attributes.
The dataset contains approximately 1,675,843 entries, with 49 attributes per entry. It is available in CSV format.
Facebook
TwitterThe World Telecommunication/ICT Indicators Database contains time series data for the years 1960, 1965, 1970 and annually from 1975 to 2020 for more than 180 telecommunication/ICT statistics covering fixed-telephone networks, mobile-cellular telephone subscriptions, quality of service, Internet (including fixed- and mobile-broadband subscription data), traffic, staff, prices, revenue, investment and statistics on ICT access and use by households and individuals. Selected demographic, macroeconomic and broadcasting statistics are also included. Data are available for over 200 economies. However, it should be noted that since ITU relies primarily on official economy data, availability of data for the different indicators and years varies. Notes explaining data exceptions are also included. The data are collected from an annual questionnaire sent to official economy contacts, usually the regulatory authority or the ministry in charge of telecommunication and ICT. Additional data are obtained from reports provided by telecommunication ministries, regulators and operators and from ITU staff reports. In some cases, estimates are made by ITU staff; these are noted in the database.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
This dataset contains a set of 265770 analytics requests obtained from the experimental campaign of the HideDroid research: HideDroid, which involved 4500 apps. The collection of all analytics requests is stored inside a .json file. The .json contains all requests inside the key "AnalyticsRequest" key; the value associated with this key is a json array. All elements of the "AnalyiticsRequest" array are indexes to app batches analyzed during the testing campaign, so they are in the form "first_app_index-last_app_index." Each of the latter keys is associated with a json array, containing all analytics requests extracted from the specific set o apps.
All elements inside the last keys is a json object containing the followings keys:
* id: id of the corresponding entry of the table, in which the request is stored
* package_name: the package name of the app that generated the hostname
* host: name of the host to which the request is delivered
* time: timestamp that indicates the time in which the request is sent
* byte_request: binary representation of the request
* method: HTTP method used to send the request
* path: path appended to the hostname
* http_protocol: HTTP protocol version
* header_json: a json object containing the set of all request headers associated to the request
* body_offset: offset of the body with respect to the headers (which are put before the body).
* body_string: the body of the request as a json object
* body_without_special_char: body without special ascii characters
Please use the following bibtex entry to cite our work:
BibTex
@misc{caputo2021cant,
title={You can't always get what you want: towards user-controlled privacy on Android},
author={Davide Caputo and Francesco Pagano and Giovanni Bottino and Luca Verderame and Alessio Merlo},
year={2021},
eprint={2106.02483},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.
The Brisbane City Council parking occupancy forecasting data is provided to be accessed by third party web or app developers to develop tools to provide Brisbane residents and visitors with likely parking availability within a paid parking area.
The parking occupancy forecasting data is compiled using advanced analytics and machine learning to estimate paid parking availability. The solution uses parking occupancy survey data, parking meter transaction data and other traffic and environmental data.
This dataset is linked to the open data called Parking — Meter locations. The field called MOBILE\_ZONE is used to link the datasets. MOBILE\_ZONE is a seven\-digit mobile payment zone number that may include one or many parking meter numbers.
Additional information on parking meters can be found on the Brisbane City Council website.
The Brisbane City Council parking occupancy forecasting data includes parking data for all of Council’s parking meters. The data attributes used in this resource and their descriptions can be found in the Parking — Occupancy forecasting — metadata — CSV resource in this dataset.
The Data and resources section of this dataset contains further information for this dataset.
Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
Facebook
TwitterThis project geolocated the location of road traffic crashes based on crowdsourced reports of crashes from Ma3Route, a mobile/web/SMS platform that crowdsources transport data
Primarily Nairobi, Kenya
Road traffic crashes
Observation data/ratings [obs]
All tweets from @Ma3Route from August 2012 to July 2023
Internet [int]
Facebook
TwitterRepresentative applications that can directly collect 5G da-tasets from mobile terminals without using specialized equipment include G-NetTrack Pro and PCAPdroid. The for-mer allows for the monitoring and logging of the header and payload information of the medium access control (MAC) frame passing through the 5G air interface. The latter is an open-source network capture and monitoring tool that works without root privileges, analyzing connections made by ap-plications installed on the user's mobile device. The latter can also dump mobile traffic to PCAP (also known as libpcap) and send it to the well-known Wireshark for further analysis. We created 5G datasets by measuring 5G traffic directly from a major mobile operator in South Korea. The model name of the mobile terminal used for traffic measurement is the Samsung Galaxy A90 5G, and it was equipped with a Qualcomm Snapdragon X50 5G modem. The packet sniffer software used for traffic measurement, PCAPdroid, was in-stalled in the terminal through Google play. Traffic was measured sequentially per application on two stationary ter-minals (only one terminal was used for non-interactive ser-vices) with no background traffic. The collected dataset is representative resource-intensive video traffic that has the greatest impact on 5G network planning and provisioning, and background traffic was not mixed to measure the unique characteristics of each type of traffic. The video streaming dataset includes data directly meas-ured while watching Netflix and Amazon Prime, which are representative over-the-top (OTT) services, on mobile devic-es. The live streaming dataset was measured while watching YouTube Live and South Korea's representative live broad-casts (Naver NOW and Afreeca TV). Video conferencing data were measured by holding an actual meeting on the widely used Zoom, MS Teams, and Google Meet platform. Two types of metaverse traffic were acquired: Zepeto and Roblox. Zepeto traffic was collected while staying in the 'camping world' for 15 hours. Roblox traffic was collected over 25 hours of playing the 'Collect All Pets' game using an auto clicker. We collected two types of mobile network gaming traffic. The first was cloud gaming, an online game setup that runs video games on remote servers and streams them direct-ly to the user's device. The second was a traditional mobile game connected to the Internet. The dataset was collected from May to October 2022, is a massive 328 hours in total, and is provided in the csv file format. The dataset we collected is a timestamp-mapped time series dataset with packet header information, and traffic analysis by application is possible because it includes source and destination addresses. To make it more usable as a traffic source model, Section III describes how to use it as a training dataset for the traffic simulator platform's source generator.
A 5G traffic dataset measured by PCAPdroid has been re-leased and can be used as a training dataset for various ML models. However, since the size of this dataset is very large, it is inconvenient to handle, and additional data preprocessing is required to use it for its intended purpose.
This data set can be used to learn GANs, time-series forcasting deep learning models.
Our implementation is given on GitHub. https://github.com/0913ktg/5G-Traffic-Generator