Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Representative applications that can directly collect 5G da-tasets from mobile terminals without using specialized equipment include G-NetTrack Pro and PCAPdroid. The for-mer allows for the monitoring and logging of the header and payload information of the medium access control (MAC) frame passing through the 5G air interface. The latter is an open-source network capture and monitoring tool that works without root privileges, analyzing connections made by ap-plications installed on the user's mobile device. The latter can also dump mobile traffic to PCAP (also known as libpcap) and send it to the well-known Wireshark for further analysis. We created 5G datasets by measuring 5G traffic directly from a major mobile operator in South Korea. The model name of the mobile terminal used for traffic measurement is the Samsung Galaxy A90 5G, and it was equipped with a Qualcomm Snapdragon X50 5G modem. The packet sniffer software used for traffic measurement, PCAPdroid, was in-stalled in the terminal through Google play. Traffic was measured sequentially per application on two stationary ter-minals (only one terminal was used for non-interactive ser-vices) with no background traffic. The collected dataset is representative resource-intensive video traffic that has the greatest impact on 5G network planning and provisioning, and background traffic was not mixed to measure the unique characteristics of each type of traffic. The video streaming dataset includes data directly meas-ured while watching Netflix and Amazon Prime, which are representative over-the-top (OTT) services, on mobile devic-es. The live streaming dataset was measured while watching YouTube Live and South Korea's representative live broad-casts (Naver NOW and Afreeca TV). Video conferencing data were measured by holding an actual meeting on the widely used Zoom, MS Teams, and Google Meet platform. Two types of metaverse traffic were acquired: Zepeto and Roblox. Zepeto traffic was collected while staying in the 'camping world' for 15 hours. Roblox traffic was collected over 25 hours of playing the 'Collect All Pets' game using an auto clicker. We collected two types of mobile network gaming traffic. The first was cloud gaming, an online game setup that runs video games on remote servers and streams them direct-ly to the user's device. The second was a traditional mobile game connected to the Internet. The dataset was collected from May to October 2022, is a massive 328 hours in total, and is provided in the csv file format. The dataset we collected is a timestamp-mapped time series dataset with packet header information, and traffic analysis by application is possible because it includes source and destination addresses. To make it more usable as a traffic source model, Section III describes how to use it as a training dataset for the traffic simulator platform's source generator.
A 5G traffic dataset measured by PCAPdroid has been re-leased and can be used as a training dataset for various ML models. However, since the size of this dataset is very large, it is inconvenient to handle, and additional data preprocessing is required to use it for its intended purpose.
This data set can be used to learn GANs, time-series forcasting deep learning models.
Our implementation is given on GitHub. https://github.com/0913ktg/5G-Traffic-Generator
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository is part of the ITC-NetMingledApp dataset, which includes network traffic data from 36 Android applications, with each capture featuring concurrent traffic from multiple applications and smartphones. This repository contains part #2 of the data related to the Iran-Tehran scenario. Each capture is stored in a compressed file containing the relevant PCAP files of the associated applications. The PCAP files are named according to a convention: {TimeStamp}_{Application Name}{Download-Upload Speed}.pcap Part #1 of Iran-Tehran scenario is in the Tehran Dataset #1 (https://doi.org/10.17632/9frgkybxhn.1) repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verified dataset of 2025 device usage: share of global web traffic, mobile commerce share of transactions, US daily time spent, app vs web breakdown, and tablet decline.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository is part of the ITC-NetMingledApp dataset, which includes network traffic data from 36 Android applications, with each capture featuring concurrent traffic from multiple applications and smartphones. This repository contains data related to the Iran-Qom scenario. Each capture is stored in a compressed file containing the relevant PCAP files of the associated applications. The PCAP files are named according to a convention: {TimeStamp}_{Application Name}{Download-Upload Speed}.pcap
This dataset encompasses mobile web clickstream behavior on any browser, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or path to purchase and consumer journey understanding. Full URL deliverable available including searches.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
This competition involves advertisement data provided by BuzzCity Pte. Ltd. BuzzCity is a global mobile advertising network that has millions of consumers around the world on mobile phones and devices. In Q1 2012, over 45 billion ad banners were delivered across the BuzzCity network consisting of more than 10,000 publisher sites which reach an average of over 300 million unique users per month. The number of smartphones active on the network has also grown significantly. Smartphones now account for more than 32% phones that are served advertisements across the BuzzCity network. The "raw" data used in this competition has two types: publisher database and click database, both provided in CSV format. The publisher database records the publisher's (aka partner's) profile and comprises several fields:
publisherid - Unique identifier of a publisher. Bankaccount - Bank account associated with a publisher (may be empty) address - Mailing address of a publisher (obfuscated; may be empty) status - Label of a publisher, which can be the following: "OK" - Publishers whom BuzzCity deems as having healthy traffic (or those who slipped their detection mechanisms) "Observation" - Publishers who may have just started their traffic or their traffic statistics deviates from system wide average. BuzzCity does not have any conclusive stand with these publishers yet "Fraud" - Publishers who are deemed as fraudulent with clear proof. Buzzcity suspends their accounts and their earnings will not be paid
On the other hand, the click database records the click traffics and has several fields:
id - Unique identifier of a particular click numericip - Public IP address of a clicker/visitor deviceua - Phone model used by a clicker/visitor publisherid - Unique identifier of a publisher adscampaignid - Unique identifier of a given advertisement campaign usercountry - Country from which the surfer is clicktime - Timestamp of a given click (in YYYY-MM-DD format) publisherchannel - Publisher's channel type, which can be the following: ad - Adult sites co - Community es - Entertainment and lifestyle gd - Glamour and dating in - Information mc - Mobile content pp - Premium portal se - Search, portal, services referredurl - URL where the ad banners were clicked (obfuscated; may be empty). More details about the HTTP Referer protocol can be found in this article. Related Publication: R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). Detecting click fraud in online advertising: A data mining approach, Journal of Machine Learning Research, 15, 99-140.
This dataset encompasses mobile app usage, web clickstream and location visitation behavior, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). The only omnichannel meter at scale representing iOS and Android platforms.
Includes ties to consumer demographics.
In-app audio, media and social ad exposure data included. Can be commissioned to build other in-app and account level visibility.
Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.
Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.
User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.
Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.
GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.
Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.
High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.
Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.
Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For the evaluation of OS fingerprinting methods, we need a dataset with the following requirements:
To overcome these issues, we have decided to create the dataset from the traffic of several web servers at our university. This allows us to address the first issue by collecting traces from thousands of devices ranging from user computers and mobile phones to web crawlers and other servers. The ground truth values are obtained from the HTTP User-Agent, which resolves the second of the presented issues. Even though most traffic is encrypted, the User-Agent can be recovered from the web server logs that record every connection’s details. By correlating the IP address and timestamp of each log record to the captured traffic, we can add the ground truth to the dataset.
For this dataset, we have selected a cluster of five web servers that host 475 unique university domains for public websites. The monitoring point recording the traffic was placed at the backbone network connecting the university to the Internet.
The dataset used in this paper was collected from approximately 8 hours of university web traffic throughout a single workday. The logs were collected from Microsoft IIS web servers and converted from W3C extended logging format to JSON. The logs are referred to as web logs and are used to annotate the records generated from packet capture obtained by using a network probe tapped into the link to the Internet.
The entire dataset creation process consists of seven steps:
The collected and enriched flows contain 111 data fields that can be used as features for OS fingerprinting or any other data analyses. The fields grouped by their area are listed below:
The details of OS distribution grouped by the OS family are summarized in the table below. The Other OS family contains records generated by web crawling bots that do not include OS information in the User-Agent.
OS Family | Number of flows |
---|---|
Other | 42474 |
Windows | 40349 |
Android | 10290 |
iOS | 8840 |
Mac OS X | 5324 |
Linux | 1589 |
Ubuntu | 653 |
Fedora | 88 |
Chrome OS | 53 |
Symbian OS | 1 |
Slackware | 1 |
Linux Mint | 1 |
Gain access to high-accuracy foot traffic data covering global mobile visitation patterns and dwell behavior at points of interest. This dataset is derived from billions of opt-in mobile device signals and enables you to monitor how people interact with commercial, civic, and public spaces around the world.
Each record includes visit frequency, time-on-site (dwell time), return rate, and temporal segmentation. The foot traffic data is organized for easy enrichment of mobility data, map data, and location data use cases, and integrates seamlessly into spatial analytics platforms.
Core benefits: •Worldwide POI-level foot traffic data •Hourly time resolution with repeat visitor logic •Works with retail analytics, site planning, and consumer insights •Delivered via API or S3 •Fully anonymized and CCPA/GDPR compliant
Use this foot traffic data to improve operational efficiency, inform investment decisions, and benchmark performance against global movement patterns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides detailed insights and best practices for tracking and measuring local SEO performance across a range of critical metrics, including Google Business Profile engagement, local keyword rankings, website traffic from local searches, citation management, mobile optimization, and ROI calculation. The data is based on expert analysis and recommendations to help local businesses optimize their local search visibility and drive measurable results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Shortcut through the IPX: Measuring Latencies in Global Mobile Roaming with Regional Breakouts This repository contains a description and sample data for the Paper A Shortcut through the IPX: Measuring Latencies in Global Mobile Roaming with Regional Breakouts published at the Network Traffic Measurement and Analysis (TMA) Conference 2024.In the provided README.md file, we present example snippets of the datasets, including an explanation of all contained fields. We cover the three main datasets covered in the related paper:- DT1: User plane traces captured at multiple GGSN/PGW instances of a globaly operating MVNO- DT2: GTP echo round trip times between visited network SGSN/SGWs and home network GGSN/PGWs- DT3: IPX routing information, as extracted from BGP routing tables For legal reasons, we are not able to publish the secondary datasets (DT4, DT5) covered in the manuscript. Finally, for privacy, security, and political reasons, certain fields in each of the datasets have been anonymized. These are indicated by the _anonymized
prefix.In case of IP addresses, the anonymization ist consistent across datasets, meaning that similar IPs have been anonymized such that their values are still identical after anonymization. Contact For questions regarding the dataset, contact Viktoria Vomhoff (viktoria.vomhoff@uni-wuerzburg.de)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The table shows the number of mobile subscriptions and the number and duration of calls and data traffic from the mobile network
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Please cite our paper if you publish material based on those datasets
G. Khodabandelou, V. Gauthier, M. El-Yacoubi, M. Fiore, "Estimation of Static and Dynamic Urban Populations with Mobile Network Metadata", in IEEE Trans. on Mobile Computing, 2018 (in Press). 10.1109/TMC.2018.2871156
Abstract
Communication-enabled devices that are physically carried by individuals are today pervasive,
which opens unprecedented opportunities for collecting digital metadata about the mobility of large populations. In this paper, we propose a novel methodology for the estimation of people density at metropolitan scales, using subscriber presence metadata collected by a mobile operator. We show that our approach suits the estimation of static population densities, i.e., of the distribution of dwelling units per urban area contained in traditional censuses. Specifically, it achieves higher accuracy than that granted by previous equivalent solutions. In addition, our approach enables the estimation of dynamic population densities, i.e., the time-varying distributions of people in a conurbation. Our results build on significant real-world mobile network metadata and relevant ground-truth information in multiple urban scenarios.
Dataset Columns
This dataset cover one month of data taken during the month of April 2015 for three Italian cities: Rome, Milan, Turin. The raw data has been provided during the Telecom Italia Big Data Challenge (http://www.telecomitalia.com/tit/en/innovazione/archivio/big-data-challenge-2015.html)
1. grid_id: the coordinate of the grid can be retrieved with the shapefile of a given city
2. date: format Y-M-D H:M:S
4. landuse_label: the land use label has been computed by through method described in [2]
5. population: Census population of a given grid block as defined by the Istituto nazionale di statistica (ISTAT https://www.istat.it/en/censuses) in 2011
6. estimation: Dynamics density population estimation (in person) as the result of the method described in [1]
7. area: surface of the "grid id" considered in km^2
8. geometry: the shape of the area considered with the EPSG:3003 coordinate system (only with quilt)
Note
Due to legal constraints, we cannot share directly the original data from the Telecom Italia Big Data Challenge we used to build this dataset.
Easy access to this dataset with quilt
Install the dataset repository:
$ quilt install vgauthier/DynamicPopEstimate
Use the dataset with a Panda Dataframe
>>> from quilt.data.vgauthier import DynamicPopEstimate
>>> import pandas as pd
>>> df = pd.DataFrame(DynamicPopEstimate.rome())
Use the dataset with a GeoPanda Dataframe
>>> from quilt.data.vgauthier import DynamicPopEstimate
>>> import geopandas as gpd
>>> df = gpd.DataFrame(DynamicPopEstimate.rome())
References
[1] G. Khodabandelou, V. Gauthier, M. El-Yacoubi, M. Fiore, "Population estimation from mobile network traffic metadata", in proc of the 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), pp. 1 - 9, 2016.
[2] A. Furno, M. Fiore, R. Stanica, C. Ziemlicki, and Z. Smoreda, "A tale of ten cities: Characterizing signatures of mobile traffic in urban areas," IEEE Transactions on Mobile Computing, Volume: 16, Issue: 10, 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario E. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fixed and mobile broadband internet traffic statistics come from International Telecommunication Union (ITU), the United Nations specialized agency for information and communication technologies (ICTs).
ITU collects Internet traffic statistics on fixed and mobile broadband (inside the country) through its annual World Telecommunication/ICT Indicators short and long questionnaires according to the methodology provided in the Handbook for the Collection of Administrative Data on Telecommunications/ICT. Mobile network operators (MNOs) and Internet service providers (IPSs) systematically measure Internet data usage (both upload and download) of their customers, which is the basis of traffic statistics. Data from MNOs and ISPs are collected and aggregated by national telecommunications/ICT regulatory authorities or ministries and reported to ITU in the World Telecommunication/ICT indicators questionnaire series. Data that are unavailable from the questionnaires are compiled from publicly available sources from regulators and ministries, and from the OECD Broadband statistics.
The statistics on the internet traffic include:
Fixed-broadband internet traffic refers to the annual total volume of data traffic generated by fixed-broadband subscribers measured at the end-user access point. It should be measured by adding up download and upload traffic. Internet traffic refers to open Internet traffic generated or consumed by users connected to the Internet. Wholesale traffic (provided for another operator), walled-garden traffic, and IPTV and cable-TV traffic should be excluded. Traffic data should be collected from fixed operators offering Internet connections or ISPs by national regulatory authorities and ministries.
Mobile broadband Internet traffic (within the country) refers to the annual total broadband traffic volumes (uploaded and downloaded) originated within the country from 3G or other more advanced mobile networks, including evolutions, or equivalent standards in terms of data transmission speeds. Wholesale and walled-garden traffic should be excluded. Traffic should be measured at the end-user access point.
Mobile broadband Internet traffic (outside the country) refers to the annual total broadband traffic volumes originated outside the country from 3G or other more advanced mobile networks, including evolutions or equivalent standards in terms of data transmission speeds. Wholesale and walled-garden traffic should be excluded. Traffic should be collected and aggregated at the country level for all customers of domestic operators roaming outside the country. Traffic should be measured at the end-user access point.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Live traffic information data showing traffic information on the strategic road network in England, maintained by Highways England (previously called the Highways Agency). Update: 10th December 2015 Following the last announcement in 2013, the transformation to a new traffic information data service is complete and a new range of data products have replaced the legacy traffic information pull data services. The legacy Datex 1 datasets which provided live traffic event data are no longer being updated and have been removed from data.gov. The Datex publications have been replaced by lighter xml feeds showing current and planned roadworks and unplanned traffic events across our road network. The xml feeds listed below will be supplemented by a traffic data API and JSON mobile api which will be published on our data.gov page shortly. Please contact us if you have any immediate concerns about these changes and we will work with you to provide a replacement service where possible. Alternatively, please register as a data subscriber http://www.trafficengland.com/subscribers for more information about the new Datex service. Update: 12th August 2013 Following a change of supplier, the NTIS system is being re-developed and will eventually replace the legacy system. Please consult the updated document which describes the NTIS Legacy DATEX II v1.0 Publisher. New services will be delivered in DATEX II v2 format using web services to push data to subscribers. Full details of the new services can be found in document NIS P TIH 008 available from the TIH website. Potential subscribers’, Project Managers and engineers seeking to develop a new interface are encouraged to consider using the new services.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.