16 datasets found

i
Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...
ieee-dataport.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
Explore at:
Dataset updated
Oct 21, 2024
Authors
Mohamad Amar Irsyad Mohd Aminuddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
m
ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application...
data.mendeley.com
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abolghasem Rezaei Khesal (2024). ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application traffic for robust network traffic classification, domain adaptation, and generalization in diverse environments - Tehran Dataset #2 [Dataset]. http://doi.org/10.17632/4b9xpz4gd3.1
Explore at:
Unique identifier
https://doi.org/10.17632/4b9xpz4gd3.1
Dataset updated
Oct 7, 2024
Authors
Abolghasem Rezaei Khesal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tehran
Description
This repository is part of the ITC-NetMingledApp dataset, which includes network traffic data from 36 Android applications, with each capture featuring concurrent traffic from multiple applications and smartphones. This repository contains part #2 of the data related to the Iran-Tehran scenario. Each capture is stored in a compressed file containing the relevant PCAP files of the associated applications. The PCAP files are named according to a convention: {TimeStamp}_{Application Name}{Download-Upload Speed}.pcap Part #1 of Iran-Tehran scenario is in the Tehran Dataset #1 (https://doi.org/10.17632/9frgkybxhn.1) repository.
Z
CTU-SME-11: a labeled dataset with real benign and malicious network traffic...
data.niaid.nih.gov
zenodo.org
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bendl, Štěpán (2023). CTU-SME-11: a labeled dataset with real benign and malicious network traffic mimicking a small medium-size enterprise environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7958258
Explore at:
Dataset updated
May 26, 2023
Dataset provided by
Valeros, Veronica
Bendl, Štěpán
Garcia, Sebastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As technology advances, the number and complexity of cyber-attacks increase, forcing defense techniques to be updated and improved. To help develop effective tools for detecting security threats it is essential to have reliable and representative security datasets. Many existing security datasets have limitations that make them unsuitable for research, including lack of labels, unbalanced traffic, and outdated threats.

CTU-SME-11 is a labeled network dataset designed to address the limitations of previous datasets. The dataset was captured in a real network that mimics a small-medium enterprise setting. Raw network traffic (packets) was captured from 11 devices using tcpdump for a duration of 7 days, from 20th to 26th of February, 2023 in Prague, Czech Republic. The devices were chosen based on the enterprise setting and consists of IoT, desktop and mobile devices, both bare metal and virtualized. The devices were infected with malware or exposed to Internet attacks, and factory reset to restore benign behavior.

The raw data was processed to generate network flows (Zeek logs) which were analyzed and labeled. The dataset contains two types of levels, a high level label and a descriptive label, which were put by experts. The former can take three values, benign, malicious or background. The latter contains detailed information about the specific behavior observed in the network flows. The dataset contains 99 million labeled network flows. The overall compressed size of the dataset is 80GB and the uncompressed size is 170GB.
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Monaco, India, Belarus, Jamaica, Saint Vincent and the Grenadines, Uzbekistan, Latvia, Liechtenstein, Russian Federation, Jordan
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Mobile internet usage reach in North America 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet usage reach in North America 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.
m
ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App...
data.mendeley.com
Updated Nov 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marziyeh Bayat (2023). ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App Identification in Real-World Network Environment - Scenario D [Dataset]. http://doi.org/10.17632/mcmf627yh5.1
Explore at:
Unique identifier
https://doi.org/10.17632/mcmf627yh5.1
Dataset updated
Nov 15, 2023
Authors
Marziyeh Bayat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario D. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
Mobile internet penetration in Europe 2024, by country
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet penetration in Europe 2024, by country [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
Switzerland is leading the ranking by population share with mobile internet access , recording 95.06 percent. Following closely behind is Ukraine with 95.06 percent, while Moldova is trailing the ranking with 46.83 percent, resulting in a difference of 48.23 percentage points to the ranking leader, Switzerland. The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
How to choose the right product for your client?
kaggle.com
Updated Mar 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julia Beyers (2020). How to choose the right product for your client? [Dataset]. https://www.kaggle.com/juliabeyers/how-to-choose-the-right-product-for-your-client/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Julia Beyers
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4686357%2F186cf4f6172ca2c696819b7b09931bd3%2Fimage3.jpg?generation=1584955857130173&alt=media" alt="">

The presence of business in the digital space is a must now. Indeed, there’s hardly any company, be it a small startup or an international corporation, that wouldn’t be available online. For this, the company may use one of two options — to develop an app or a website, or both.

In the case of a limited budget, business owners often have to make a choice. Thus, considering that mobile traffic bypassed the desktop’s in 2016 and continues to grow, it becomes obvious that the business should become accessible and convenient for smartphone users. But what is better a responsive website or a mobile application?

Entrepreneurs often turn to development companies to ask this question. Lacking sufficient knowledge, they hope to get answers to their questions from people with experience in this field. So, we decided to compile a guide that will give you clear and understandable information.

Mobile app

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4686357%2F0541557795519f24d812f78dfb51867e%2Fimage4.png?generation=1584955894277647&alt=media" alt="">

Let's look at the stats. It will help you understand why a mobile app may be the obvious choice for your client.

In 2019, smartphone users installed about 204 billion(!) applications on their devices. On average, this is more than 26 applications per inhabitant of the planet Earth. And if this is not enough evidence, here’s one more point. The expected revenue of mobile applications will be $189 billion in 2020.

It sounds impressive, but this does not mean that a mobile application is something indispensable for every business. Not at all. Let's go through the pros and cons of a mobile application and try to understand when it is needed.

Pros

A new level of interaction. Mobile applications are a more convenient method of interaction. They load and process content faster. One more useful feature is notifications. Perhaps, applications are the best way to inform users about new updates, promotions, and other news (who will read long letters in the mail?).

Personalized targeting. Mobile applications are ideal for products or services that need to be used on an ongoing basis. The options like creating accounts, entering profile information, etc., make applications more personalized than websites. All this allows the business to target their audience more accurately without wasting money.

Offline usage. That’s another major advantage. Applications can provide users with access to content without an internet connection.

Cons

Development costs. In order to reach the maximum audience with a mobile app, it is necessary to cover two main operating systems — iOS and Android. Development for each OS can be too expensive for small business owners and they will have to make difficult choices. The way out of this situation is cross-platform development. Why? Because there’s no need to guess which platform targets prefer using — iOS or Android. Instead, you create just one app that runs seamlessly on both platforms.

Maintenance. The application is a technical product that needs constant support. Upgrades should be carried out in a timely manner. Often, users need to personally update applications by downloading a new version, which is annoying. Regular bug-fixing for various devices (smartphones, tablets) and different operating systems might be a real problem. Plus, any update should be confirmed by the store where the application is placed.

Suitable for businesses that provide interactive and personalized content (refers to all lifestyle and healthcare solutions), require regular app usage (for instance, to-do lists), rely on visual interaction and so on. For games, like Angry Birds, creating an app is also a wise choice.

Website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4686357%2Fd4f5bf1fdd0d0e65fae38c7251f56f13%2Fimage1.jpg?generation=1584955919738648&alt=media" alt="">

In order to be convenient for users of mobile devices, a website should be responsive. We want to make an emphasis on this since it is critically important. Most of the traffic on the Internet comes from mobile devices, so your website should be adaptable, or in other words, mobile-friendly. If a mobile user needs to zoom in all the necessary elements and text to see something, they will immediately quit your website.

On the other hand, a responsive website has the following benefits.

Pros

Maintenance. Maintaining a website is less costly. When compared to applications where the user mu...
m
Data from: An Urban Multi-Operator QoE-Aware Dataset for Cellular Networks...
data.mendeley.com
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Kabeer (2025). An Urban Multi-Operator QoE-Aware Dataset for Cellular Networks in Dense Environments [Dataset]. http://doi.org/10.17632/dx5xyyfz2y.1
Explore at:
Unique identifier
https://doi.org/10.17632/dx5xyyfz2y.1
Dataset updated
Jun 16, 2025
Authors
Muhammad Kabeer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 30,925 labelled and cleaned records collected from a dense 2 km² urban area surrounding Sunway University, Selangor, Malaysia. Using the GNetTrack Pro mobile application and Samsung S21 Ultra devices, the data spans three anonymized commercial mobile network operators and includes both 4G and 5G technologies. The dataset captures radio signal quality metrics (RSRP, RSRQ, SNR, etc.), geospatial information, mobility patterns (walking vs. driving), and application-specific traffic scenarios (HTTP, FTP, 1080p Video Streaming).

A total of 132 physical cell sites were validated via OpenCellID and field inspections. The dataset is released in CSV format and includes Python scripts for data preprocessing and basic visualization. This makes it a valuable resource for machine learning tasks like signal metric regression, handover optimization, and QoE modeling in heterogeneous and simulation of high-density urban networks.

Key features:

Real-world 5G/4G measurements

Multi-operator and multi-mobility modes

Traffic-aware profiling

Empirical validation of base station locations

Ready for ML/DL use cases

For more information look out for our article about the dataset on data in brief journal.
Views of ABC News Digital Content (May 2016)
data.gov.au
cloud.csiss.gmu.edu
+3more
csv, docx, zip
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Broadcasting Corporation (2023). Views of ABC News Digital Content (May 2016) [Dataset]. https://data.gov.au/data/dataset/abc-news-may-2016
Explore at:
docx, csv, zipAvailable download formats
Dataset updated
Aug 9, 2023
Dataset authored and provided by
Australian Broadcasting Corporationhttp://abc.net.au/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The Views of ABC News Digital Content dataset provides both the number of page/screen views per hour for individual pieces of ABC News content and metadata related to each piece of content. The data is taken from across different ABC digital platforms during the month of May 2016. These platforms include the ABC News desktop and mobile websites and the ABC app (both iOS and Android versions). Each piece of content is represented by its ID and is consistent for the same piece of content across platforms. The URL of the content can be recreated using the platform and this ID. For example, for the “News” platform and id “7373616”, the URL is retrieved using “http://www.abc.net.au/news/7373616”. The content ID is the key which joins the Traffic data with the Content Metadata. The data set covers the period from 2016-05-01 00:00:00 to 2016-05-31 23:59:59.

Rights information

This data includes metadata about existing publicly available stories. In general terms, developers are free to use this data to explore ABC's content. But original stories and images should always be linked to. Stories and photos should not be reproduced in whole on another service. _ The stories themselves and their associated media items still remain the property of ABC and other rights holders where noted._ Full details of copyright and more are listed on abc.net.au: http://www.abc.net.au/conditions.htm
Z
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoang Van Tien (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Cozza Vittoria
Hoang Van Tien
De Nicola Rocco
Petrocchi Marinella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
d
Datos Global Activity Feed (~20M Monthly Active Users Worldwide)
datarade.ai
.csv, .txt
Updated May 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datos, A Semrush Company (2023). Datos Global Activity Feed (~20M Monthly Active Users Worldwide) [Dataset]. https://datarade.ai/data-products/datos-global-activity-feed-20m-monthly-active-users-worldwide-datos
Explore at:
.csv, .txtAvailable download formats
Dataset updated
May 12, 2023
Dataset authored and provided by
Datos, A Semrush Company
Area covered
Peru, Costa Rica, Malta, Tokelau, Guatemala, Andorra, Svalbard and Jan Mayen, Cyprus, Armenia, Korea (Republic of)
Description
Datos brings to market anonymized, at scale, consolidated privacy-secured datasets with a granularity rarely found in the market. Get access to the desktop and mobile browsing behavior for millions of users across the globe, packaged into clean, easy-to-understand data products and reports.

The Datos Activity Feed is an event-level accounting of all observed URL visits executed by devices which Datos has access to over a given period of time.

This feed can be delivered on a daily basis, delivering the previous day’s data. It can be filtered by any of the fields, so you can focus on what’s important for you, whether it be specific markets or domains.

Now available with Datos Low-Latency Feed This add-on ensures delivery of approximately 99% of all devices before markets open in New York (the lowest latency product on the market). Our clickstream data is made up of an array of upstream sources. The DLLF makes the daily output of these sources available as they arrive and are processed, rather than a once-daily batch.
Z
Mobile Device Voice Recordings at King's College London (MDVR-KCL) from both...
data.niaid.nih.gov
explore.openaire.eu
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Stadtschnitzer (2020). Mobile Device Voice Recordings at King's College London (MDVR-KCL) from both early and advanced Parkinson's disease patients and healthy controls [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2867215
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Dhaval Trivedi
Michael Stadtschnitzer
Hagen Jaeger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset description

The dataset description will start with describing the local conditions and other metadata, then will continue with describing the recording procedure and annotation methodology. Finally, a brief description of the dataset deployment and publication will be given.

Meta Information

The dataset was recorded at King's College London (KCL) Hospital, Denmark Hill, Brixton, London SE5 9RS in the period from 26 to 29 September 2017. We used a typical examination room with about ten square meters area and a typical reverberation tome of approx. 500ms to perform the voice recordings. Due to the fact, that the voice recordings are performed in the realistic situation of doing a phone call (i.e. participant holds the phone to the preferred ear and microphone is in direct proximity to the mouth), one can assume that all recordings were performed within the reverberation radius and thus can be considered as “clean”.

Recording Procedure

We used a Motorola Moto G4 Smartphone as recording device. To perform the voice recordings on the device, we developed a “Toggle Recording App”, which uses the same functionalities as the voice recording module used within the i-PROGNOSIS Smartphone application, but deployed as a standalone android application. This means, that the voice capturing service runs as a standalone background service on the recording device and triggers voice recordings via on- and off-hook signals of the Smartphone. Due to the fact, that we directly record the microphone signal, and not the GSM (“Global System for Mobile Communications”) compressed stream, we end up with high quality recordings with a sample rate of 44.1 kHz and a bit depth of 16 Bit (audio CD quality). The raw, uncompressed data is directly written to the external storage of the Smartphone (SD-card) using the well-known WAVE file format (.wav). We used the following workflow to perform a voice recording:

Ask the participant to relax a bit and then to make a phone call to the test executor (off-hook signal triggered).}

Ask the participant to read out “The North Wind and the Sun”

Depending on the constitution of the participant either ask to read out “Tech. Engin. Computer applications in geography snippet”

Start a spontaneous dialog with the participant, the test executor starts asking random questions about places of interest, local traffic, or personal interests if acceptable.

Test executor ends call by farewell (on-hook signal triggered).

Annotation Scheme

For each HC and PD participant, we labeled the data regarding scores on the Hoehn & Yahr (H&Y), as well as the UPDRS II part 5 and UPDRS III part 18 scale. The voice recordings are labeled in the following scheme:

SI_ HS_ HYR_ UPDRS II-5_UPDRS III-18

with

SI as subject identification in the form IDNN, N in [0, 9]

HS as the health status label (hc or pd accordingly)

HYR as the expert assessed H&Y scale rating

UPDRS II-5 as the according expert peer-reviewed score

UPDRS III-18 as the according expert assessed score

For example, an audio recording with the file name “ID02_pd_1_2_1.wav” represents a recording of the third participant (First participant was anonymized as ID00), which has PD and a H&Y rating of 1, a UPDRS II-5 score of 2 and a UPDRS III-18 score of 1. At this point, it should be noted, that also all healthy controls were evaluated with regard to the introduced scales, because Parkinson's disease and voice degradation correlate, but don't match exactly. This means, that the data set includes one HC participant (ID31) with UPDRS II-5 and III-18 rating of 1, and also includes PD patients with UPDRS II-5 and III-18 ratings of 0. It should be emphasized, that this does not mean the data set includes ambiguous information, but that an expert was not able to hear voice degradation that would end up in a UPDRS rating greater than zero. Machine learning approaches may be able to nevertheless classify correctly, or at least learn to correlate, but not match PD and voice degradation at any time.

Appendix

North Wind and the Sun (Orthographic Version):

“The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveler take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew the more closely did the traveler fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shone out warmly, and immediately the traveler took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two.”

BNC – Tech. Engin. Computer applications in geography snippet:

“[...] This is because there is less scattering of blue light as the atmospheric path length and consequently the degree of scattering of the incoming radiation is reduced. For the same reason, the sun appears to be whiter and less orange-coloured as the observer's altitude increases; this is because a greater proportion of the sunlight comes directly to the observer's eye. Figure 5.7 is a schematic representation of the path of electromagnetic energy in the visible spectrum as it travels from the sun to the Earth and back again towards a sensor mounted on an orbiting satellite. The paths of waves representing energy prone to scattering (that is, the shorter wavelengths) as it travels from sun to Earth are shown. To the sensor it appears that all the energy has been reflected from point P on the ground whereas, in fact, it has not, because some has been scattered within the atmosphere and has never reached the ground at all. [...]”
d
Datos Domain Traffic Feed (~20M Monthly Active Users Worldwide)
datarade.ai
.csv, .txt
Updated Jul 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datos, A Semrush Company (2023). Datos Domain Traffic Feed (~20M Monthly Active Users Worldwide) [Dataset]. https://datarade.ai/data-products/datos-domain-traffic-feed-20m-monthly-active-users-worldwide-datos
Explore at:
.csv, .txtAvailable download formats
Dataset updated
Jul 22, 2023
Dataset authored and provided by
Datos, A Semrush Company
Area covered
Cabo Verde, Saint Pierre and Miquelon, Curaçao, Morocco, Belarus, Colombia, Portugal, Uzbekistan, Togo, Egypt
Description
Datos brings to market anonymized, at scale, consolidated privacy-secured datasets with a granularity rarely found in the market. Get access to the desktop and mobile browsing behavior for millions of users across the globe, packaged into clean, easy-to-understand data products and reports.

The Datos Domain Traffic Feed reports on panelist visitation to domains, benchmarking the popularity of internet properties worldwide by country. Additionally, we offer the ability to track the availability of domains with respect to whether traffic is being sent to sites which are currently unregistered. Customers can elect to focus on specific domains, countries, or domain registration status.

Now available with Datos Low-Latency Feed This add-on ensures delivery of approximately 99% of all devices before markets open in New York (the lowest latency product on the market). Our clickstream data is made up of an array of upstream sources. The DLLF makes the daily output of these sources available as they arrive and are processed, rather than a once-daily batch.
f
Data from: Revealing QoE of Web Users from Encrypted Network Traffic
figshare.com
zip
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi (2020). Revealing QoE of Web Users from Encrypted Network Traffic [Dataset]. http://doi.org/10.6084/m9.figshare.12459293.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12459293.v1
Dataset updated
Jun 16, 2020
Dataset provided by
figshare
Authors
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages

Explore at:

Dataset updated

Oct 21, 2024

Authors

Mohamad Amar Irsyad Mohd Aminuddin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

Clear search

Close search

Google apps

Main menu

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

Mobile internet users worldwide 2020-2029

ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application...

CTU-SME-11: a labeled dataset with real benign and malicious network traffic...

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

Mobile internet usage reach in North America 2020-2029

ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App...

Mobile internet penetration in Europe 2024, by country

How to choose the right product for your client?

Mobile app

Pros

Cons

Website

Pros

Data from: An Urban Multi-Operator QoE-Aware Dataset for Cellular Networks...

Views of ABC News Digital Content (May 2016)

Rights information

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Datos Global Activity Feed (~20M Monthly Active Users Worldwide)

Mobile Device Voice Recordings at King's College London (MDVR-KCL) from both...

Datos Domain Traffic Feed (~20M Monthly Active Users Worldwide)

Data from: Revealing QoE of Web Users from Encrypted Network Traffic

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages