The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.
When asked about "Attitudes towards the internet", most Japanese respondents pick "I'm concerned that my data is being misused on the internet" as an answer. 35 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
When asked about "Attitudes towards the internet", most Mexican respondents pick "It is important to me to have mobile internet access in any place" as an answer. 56 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual data on internet usage in Great Britain, including frequency of internet use, internet activities and internet purchasing.
When asked about "Attitudes towards the internet", most Chinese respondents pick "It is important to me to have mobile internet access in any place" as an answer. 50 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
When asked about "Attitudes towards the internet", most Australian respondents pick "It is important to me to have mobile internet access in any place" as an answer. 55 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
This layer shows computer ownership and internet access by age and race. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percent of population age 18 to 64 in households with no computer. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): B28005, B28003, B28009B, B28009C, B28009D, B28009E, B28009F, B28009G, B28009H, B28009I Data downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This archive contains the files submitted to the 4th International Workshop on Data: Acquisition To Analysis (DATA) at SenSys. Files provided in this package are associated with the paper titled "Dataset: Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices"
With the rapid development and usage of Internet-of-Things (IoT) and smart-home devices, researchers continue efforts to improve the ''smartness'' of those devices to address daily needs in people's lives. Such efforts usually begin with understanding evolving user behaviors on how humans utilize the devices and what they expect in terms of their behavior. However, while research efforts abound, there is a very limited number of datasets that researchers can use to both understand how people use IoT devices and to evaluate algorithms or systems for smart spaces. In this paper, we collect and characterize more than 50,000 recipes from the online If-This-Then-That (IFTTT) service to understand a seemingly straightforward but complicated question: ''What kinds of behaviors do humans expect from their IoT devices?'' The dataset we collected contains the basic information of the IFTTT rules, trigger and action event, and how many people are using each rule.
For more detail about this dataset, please refer to the paper listed above.
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
MyDigitalFootprint (MDF) is a novel large-scale dataset composed of smartphone embedded sensors data, physical proximity information, and Online Social Networks interactions aimed at supporting multimodal context-recognition and social relationships modelling in mobile environments. The dataset includes two months of measurements and information collected from the personal mobile devices of 31 volunteer users by following the in-the-wild data collection approach: the data has been collected in the users' natural environment, without limiting their usual behaviour. Existing public datasets generally consist of a limited set of context data, aimed at optimising specific application domains (human activity recognition is the most common example). On the contrary, the dataset contains a comprehensive set of information describing the user context in the mobile environment.
The complete analysis of the data contained in MDF has been presented in the following publication:
https://www.sciencedirect.com/science/article/abs/pii/S1574119220301383?via%3Dihub
The full anonymised dataset is contained in the folder MDF. Moreover, in order to demonstrate the efficacy of MDF, there are three proof of concept context-aware applications based on different machine learning tasks:
For the sake of reproducibility, the data used to evaluate the proof-of-concept applications are contained in the folders link-prediction, context-recognition, and cars, respectively.
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Context: This dataset offers insights into the usage patterns of social media apps for 1,000 users across seven popular platforms: Facebook, Instagram, Twitter, Snapchat, TikTok, LinkedIn, and Pinterest. It tracks various metrics such as daily time spent on the app, number of posts made, likes received, and new followers gained.
Dataset Features:
User_ID: Unique identifier for each user. App: The social media platform being used. Daily_Minutes_Spent: Total time a user spends on the app each day, ranging from 5 to 500 minutes. Posts_Per_Day: Number of posts a user creates per day, ranging from 0 to 20. Likes_Per_Day: Total number of likes a user receives on their posts each day, ranging from 0 to 200. Follows_Per_Day: The number of new followers a user gains daily, ranging from 0 to 50. Context & Use Cases: This dataset could be particularly useful for social media analysts, digital marketers, or researchers interested in understanding user engagement trends across different platforms. It provides insights into how much time users spend, how actively they post, and the level of engagement they receive (in terms of likes and followers).
Conclusion & Outcome: Analyzing this dataset could yield several outcomes:
Engagement Patterns: Identifying which platforms have higher engagement in terms of time spent or likes received. Active Users: Determining which users are the most active across various platforms based on the number of posts and followers gained. User Retention: Studying the correlation between time spent and follower growth, providing insight into user retention strategies for different platforms. Overall, the dataset allows for exploration of social media usage trends and helps drive decision-making for marketing strategies, content creation, and platform engagement.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data is used for a broadband mapping initiative conducted by the Washington State Broadband Office.This dataset provides global fixed broadband and mobile (cellular) network performance metrics in zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile. Measurements are filtered to results containing GPS-quality location accuracy. The data was processed and published to ArcGIS Living Atlas by Esri.AboutSpeedtest data is used today by commercial fixed and mobile network operators around the world to inform network buildout, improve global Internet quality, and increase Internet accessibility. Government regulators such as the United States Federal Communications Commission and the Malaysian Communications and Multimedia Commission use Speedtest data to hold telecommunications entities accountable and direct funds for rural and urban connectivity development. Ookla licenses data to NGOs and educational institutions to fulfill its mission: to help make the internet better, faster and more accessible for everyone. Ookla hopes to further this mission by distributing the data to make it easier for individuals and organizations to use it for the purposes of bridging the social and economic gaps between those with and without modern Internet access.DataTilesHundreds of millions of Speedtests are taken on the Ookla platform each month. In order to create a manageable dataset, we aggregate raw data into tiles. The size of a data tile is defined as a function of "zoom level" (or "z"). At z=0, the size of a tile is the size of the whole world. At z=1, the tile is split in half vertically and horizontally, creating 4 tiles that cover the globe. This tile-splitting continues as zoom level increases, causing tiles to become exponentially smaller as we zoom into a given region. By this definition, tile sizes are actually some fraction of the width/height of Earth according to Web Mercator projection (EPSG:3857). As such, tile size varies slightly depending on latitude, but tile sizes can be estimated in meters.For the purposes of these layers, a zoom level of 16 (z=16) is used for the tiling. This equates to a tile that is approximately 610.8 meters by 610.8 meters at the equator (18 arcsecond blocks). The geometry of each tile is represented in WGS 84 (EPSG:4326) in the tile field.The data can be found at: https://github.com/teamookla/ookla-open-dataUpdate Cadence The tile aggregates start in Q1 2019 and go through the most recent quarter. They will be updated shortly after the conclusion of the quarter.Esri ProcessingThis layer is a best available aggregation of the original Ookla dataset. This means that for each tile that data is available, the most recent data is used. So for instance, if data is available for a tile for Q2 2019 and for Q4 2020, the Q4 2020 data is awarded to the tile. The default visualization for the layer is the "broadband index". The broadband index is a bivariate index based on both the average download speed and the average upload speed. For Mobile, the score is indexed to a standard of 25 megabits per second (Mbps) download and 3 Mbps upload. A tile with average Speedtest results of 25/3 Mbps is awarded 100 points. Tiles with average speeds above 25/3 are shown in green, tiles with average speeds below this are shown in fuchsia. For Fixed, the score is indexed to a standard of 100 Mbps download and 3 Mbps upload. A tile with average Speedtest results of 100/20 Mbps is awarded 100 points. Tiles with average speeds above 100/20 are shown in green, tiles with average speeds below this are shown in fuchsia.Tile Attributes Each tile contains the following adjoining attributes:The year and the quarter that the tests were performed.The average download speed of all tests performed in the tile, represented in megabits per second.The average upload speed of all tests performed in the tile, represented in megabits per second.The average latency of all tests performed in the tile, represented in millisecondsThe number of tests taken in the tile.The number of unique devices contributing tests in the tile.The quadkey representing the tile.QuadkeysQuadkeys can act as a unique identifier for the tile. This can be useful for joining data spatially from multiple periods (quarters), creating coarser spatial aggregations without using geospatial functions, spatial indexing, partitioning, and an alternative for storing and deriving the tile geometry.LayersThere are two layers:Ookla_Mobile_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a cellular connection type (e.g. 4G LTE, 5G NR).Ookla_Fixed_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a non-cellular connection type (e.g. WiFi, ethernet).The layers are set to draw at scales 1:3,000,000 and larger.Time Period and Update FrequencyLayers are generated based on a quarter year of data (three months) and files will be updated and added on a quarterly basis. A /year=2020/quarter=1/ period, the first quarter of the year 2020, would include all data generated on or after 2020-01-01 and before 2020-04-01.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SUST14 - Individuals who used the internet in the last 3 months by the factors they considered when they most recently bought a device. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Individuals who used the internet in the last 3 months by the factors they considered when they most recently bought a device...
This data set contains internet traffic data captured by an Internet Service Provider (ISP) using Mikrotik SDN Controller and packet sniffer tools. The data set includes traffic from over 2000 customers who use Fibre to the Home (FTTH) and Gpon internet connections. The data was collected over a period of several months and contains all traffic in its original format with headers and packets.
The data set contains information on inbound and outbound traffic, including web browsing, email, file transfers, and more. The data set can be used for research in areas such as network security, traffic analysis, and machine learning.
**Data Collection Method: ** The data was captured using Mikrotik SDN Controller and packet sniffer tools. These tools capture traffic data by monitoring network traffic in real-time. The data set contains all traffic data in its original format, including headers and packets.
**Data Set Content: ** The data set is provided in a CSV format and includes the following fields:
MAC Protocol Examples 802.2 - 802.2 Frames (0x0004) arp - Address Resolution Protocol (0x0806) homeplug-av - HomePlug AV MME (0x88E1) ip - Internet Protocol version 4 (0x0800) ipv6 - Internet Protocol Version 6 (0x86DD) ipx - Internetwork Packet Exchange (0x8137) lldp - Link Layer Discovery Protocol (0x88CC) loop-protect - Loop Protect Protocol (0x9003) mpls-multicast - MPLS multicast (0x8848) mpls-unicast - MPLS unicast (0x8847) packing-compr - Encapsulated packets with compressed IP packing (0x9001) packing-simple - Encapsulated packets with simple IP packing (0x9000) pppoe - PPPoE Session Stage (0x8864) pppoe-discovery - PPPoE Discovery Stage (0x8863) rarp - Reverse Address Resolution Protocol (0x8035) service-vlan - Provider Bridging (IEEE 802.1ad) & Shortest Path Bridging IEEE 802.1aq (0x88A8) vlan - VLAN-tagged frame (IEEE 802.1Q) and Shortest Path Bridging IEEE 802.1aq with NNI compatibility (0x8100)
**Data Usage: ** The data set can be used for research in areas such as network security, traffic analysis, and machine learning. Researchers can use the data to develop new algorithms for detecting and preventing cyber attacks, analyzing internet traffic patterns, and more.
**Data Availability: ** If you are interested in using this data set for research purposes, please contact us at asfandyar250@gmail.com for more information and references. The data set is available for download on Kaggle and can be accessed by researchers who have obtained permission from the ISP.
We hope this data set will be useful for researchers in the field of network security and traffic analysis. If you have any questions or need further information, please do not hesitate to contact us.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5985737%2F61c81ce9eb393f8fc7c15540c9819b95%2FData.PNG?generation=1683750473536727&alt=media" alt="">
You can use Wireshark or other software's to view files
Distributes NIST estimate of official U.S. time over the Internet in real time, using Network Time Protocol (NTP) and other time data formats to automatically synchronize clocks in computers and network devices to official U.S. time as realized by NIST several billions of times per day. This official U.S. time is the NIST estimate of Coordinated Universal Time (UTC), and called UTC(NIST). The accuracy of UTC(NIST) as distributed through the Internet Time Service (ITS) is on the order of 0.001 seconds (one millisecond), although accuracy can vary depending on network conditions and other parameters. Note that unlike most traditional datasets, time is intrinsically a transient, ever-changing quantity. As soon as UTC(NIST) is transmitted to a client, that particular value of UTC(NIST) no longer reflects the current time, which is constantly changing. There is thus no static storage of any time data, apart from internal diagnostic information not released to the public which ensures that UTC(NIST) as disseminated through the Internet Time Service (ITS) is commensurate with the official UTC(NIST) realization within the uncertainties of the system. The vast majority of UTC(NIST) information distributed through ITS is provided freely, anonymously and automatically to the public. Any IP address can request UTC(NIST) through the ITS and the information is automatically and anonymously provided at no cost to the user. Full documentation of the ITS including all the source code is available to the public through the web site http://www.nist.gov/pml/div688/.NIST provides an authenticated version of ITS to a limited number of users (approximately 500 users near the end of calendar year 2015) who for various reasons want to ensure they are receiving UTC(NIST) without spoofing or interference with the information. This service uses public key encryption for the set of registered users to provide authenticated UTC(NIST).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
Feature | Meaning |
btle.advertising_header | BLE Advertising Packet Header |
btle.advertising_header.ch_sel | BLE Advertising Channel Selection Algorithm |
btle.advertising_header.length | BLE Advertising Length |
btle.advertising_header.pdu_type | BLE Advertising PDU Type |
btle.advertising_header.randomized_rx | BLE Advertising Rx Address |
btle.advertising_header.randomized_tx | BLE Advertising Tx Address |
btle.advertising_header.rfu.1 | Reserved For Future 1 |
btle.advertising_header.rfu.2 | Reserved For Future 2 |
btle.advertising_header.rfu.3 | Reserved For Future 3 |
btle.advertising_header.rfu.4 | Reserved For Future 4 |
btle.control.instant | Instant Value Within a BLE Control Packet |
btle.crc.incorrect | Incorrect CRC |
btle.extended_advertising | Advertiser Data Information |
btle.extended_advertising.did | Advertiser Data Identifier |
btle.extended_advertising.sid | Advertiser Set Identifier |
btle.length | BLE Length |
frame.cap_len | Frame Length Stored Into the Capture File |
frame.interface_id | Interface ID |
frame.len | Frame Length Wire |
nordic_ble.board_id | Board ID |
nordic_ble.channel | Channel Index |
nordic_ble.crcok | Indicates if CRC is Correct |
nordic_ble.flags | Flags |
nordic_ble.packet_counter | Packet Counter |
nordic_ble.packet_time | Packet time (start to end) |
nordic_ble.phy | PHY |
nordic_ble.protover | Protocol Version |
Identified Key Features Within IP-Based Packets Dataset
Feature | Meaning |
http.content_length | Length of content in an HTTP response |
http.request | HTTP request being made |
http.response.code | Sequential number of an HTTP response |
http.response_number | Sequential number of an HTTP response |
http.time | Time taken for an HTTP transaction |
tcp.analysis.initial_rtt | Initial round-trip time for TCP connection |
tcp.connection.fin | TCP connection termination with a FIN flag |
tcp.connection.syn | TCP connection initiation with SYN flag |
tcp.connection.synack | TCP connection establishment with SYN-ACK flags |
tcp.flags.cwr | Congestion Window Reduced flag in TCP |
tcp.flags.ecn | Explicit Congestion Notification flag in TCP |
tcp.flags.fin | FIN flag in TCP |
tcp.flags.ns | Nonce Sum flag in TCP |
tcp.flags.res | Reserved flags in TCP |
tcp.flags.syn | SYN flag in TCP |
tcp.flags.urg | Urgent flag in TCP |
tcp.urgent_pointer | Pointer to urgent data in TCP |
ip.frag_offset | Fragment offset in IP packets |
eth.dst.ig | Ethernet destination is in the internal network group |
eth.src.ig | Ethernet source is in the internal network group |
eth.src.lg | Ethernet source is in the local network group |
eth.src_not_group | Ethernet source is not in any network group |
arp.isannouncement | Indicates if an ARP message is an announcement |
Identified Key Features Within IP-Based Flows Dataset
Feature | Meaning |
proto | Transport layer protocol of the connection |
service | Identification of an application protocol |
orig_bytes | Originator payload bytes |
resp_bytes | Responder payload bytes |
history | Connection state history |
orig_pkts | Originator sent packets |
resp_pkts | Responder sent packets |
flow_duration | Length of the flow in seconds |
fwd_pkts_tot | Forward packets total |
bwd_pkts_tot | Backward packets total |
fwd_data_pkts_tot | Forward data packets total |
bwd_data_pkts_tot | Backward data packets total |
fwd_pkts_per_sec | Forward packets per second |
bwd_pkts_per_sec | Backward packets per second |
flow_pkts_per_sec | Flow packets per second |
fwd_header_size | Forward header bytes |
bwd_header_size | Backward header bytes |
fwd_pkts_payload | Forward payload bytes |
bwd_pkts_payload | Backward payload bytes |
flow_pkts_payload | Flow payload bytes |
fwd_iat | Forward inter-arrival time |
bwd_iat | Backward inter-arrival time |
flow_iat | Flow inter-arrival time |
active | Flow active duration |
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
What is inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data was scraped from a successful online C2C fashion store with over 9M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
For other licensing options, contact me.
The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.