31 datasets found

L
Open Data Website Traffic
data.lacity.org
s.cnmilf.com
+2more
csv, xlsx, xml
Updated Sep 11, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Open Data Website Traffic [Dataset]. https://data.lacity.org/Community-Economic-Development/Open-Data-Website-Traffic/d4kt-8j3n
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Sep 11, 2018
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Traffic Exchange Analysis Dataset 2024
sparktraffic.com
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SparkTraffic (2024). Traffic Exchange Analysis Dataset 2024 [Dataset]. https://www.sparktraffic.com/blog/reason-not-to-use-traffic-exchanges
Explore at:
Dataset updated
Jun 10, 2024
Dataset authored and provided by
SparkTraffic
Description
Research data on traffic exchange limitations including low-quality traffic characteristics, search engine penalty risks, and comparison with effective alternatives like SEO and content marketing strategies.
C
City of Pittsburgh Traffic Count
data.wprdc.org
datasets.ai
csv, geojson
Updated Jun 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Pittsburgh (2024). City of Pittsburgh Traffic Count [Dataset]. https://data.wprdc.org/dataset/traffic-count-data-city-of-pittsburgh
Explore at:
csv, geojson(421434)Available download formats
Dataset updated
Jun 9, 2024
Dataset authored and provided by
City of Pittsburgh
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Pittsburgh
Description
This traffic-count data is provided by the City of Pittsburgh's Department of Mobility & Infrastructure (DOMI). Counters were deployed as part of traffic studies, including intersection studies, and studies covering where or whether to install speed humps. In some cases, data may have been collected by the Southwestern Pennsylvania Commission (SPC) or BikePGH.

Data is currently available for only the most-recent count at each location.

Traffic count data is important to the process for deciding where to install speed humps. According to DOMI, they may only be legally installed on streets where traffic counts fall below a minimum threshhold. Residents can request an evaluation of their street as part of DOMI's Neighborhood Traffic Calming Program. The City has also shared data on the impact of the Neighborhood Traffic Calming Program in reducing speeds.

Different studies may collect different data. Speed hump studies capture counts and speeds. SPC and BikePGH conduct counts of cyclists. Intersection studies included in this dataset may not include traffic counts, but reports of individual studies may be requested from the City. Despite the lack of count data, intersection studies are included to facilitate data requests.

Data captured by different types of counting devices are included in this data. StatTrak counters are in use by the City, and capture data on counts and speeds. More information about these devices may be found on the company's website. Data includes traffic counts and average speeds, and may also include separate counts of bicycles.

Tubes are deployed by both SPC and BikePGH and used to count cyclists. SPC may also deploy video counters to collect data.

NOTE: The data in this dataset has not updated since 2021 because of a broken data feed. We're working to fix it.
r
Walmart.com Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Walmart.com Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-walmart-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2020 - 2025
Area covered
United States
Variables measured
Daily website visits, Session duration metrics, Traffic source breakdown, Geographic traffic patterns, Seasonal traffic variations, Mobile vs desktop traffic distribution
Description
Comprehensive dataset analyzing Walmart.com's daily website traffic, including 16.7 million daily visits, device distribution, geographic patterns, and competitive benchmarking data.
s
Comparison of Top Sites to Buy Website Traffic 2025
sparktraffic.com
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilien Dambon (2025). Comparison of Top Sites to Buy Website Traffic 2025 [Dataset]. https://www.sparktraffic.com/blog/best-sites-to-buy-website-traffic-in-2025
Explore at:
Dataset updated
Jan 3, 2025
Authors
Cecilien Dambon
Description
A dataset comparing features, pricing, and ratings of the top sites to buy website traffic in 2025: Google Ads, Facebook Ads, PropellerAds, and SparkTraffic.
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Latvia, Jordan, Monaco, Uzbekistan, Saint Vincent and the Grenadines, India, Liechtenstein, Russian Federation, Belarus, Jamaica
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
s
Organic Traffic Analysis
sparktraffic.com
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilien Dambon (2023). Organic Traffic Analysis [Dataset]. https://www.sparktraffic.com/blog/what-is-organic-traffic
Explore at:
Dataset updated
Jan 1, 2023
Authors
Cecilien Dambon
Description
A dataset explaining organic traffic, its importance for SEO, and methods to track it in Google Analytics 4.
Daily website visitors (time series regression)
kaggle.com
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bob Nau
Description
Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
s
Data from: Traffic Volumes
data.sandiego.gov
Updated Jul 29, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Traffic Volumes [Dataset]. https://data.sandiego.gov/datasets/traffic-volumes/
Explore at:
csv csv is tabular data. excel, google docs, libreoffice calc or any plain text editor will open files with this format. learn moreAvailable download formats
Dataset updated
Jul 29, 2016
Description
The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.
d
Jefferson County KY Traffic Web Cameras
catalog.data.gov
data.lojic.org
+8more
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2025). Jefferson County KY Traffic Web Cameras [Dataset]. https://catalog.data.gov/dataset/jefferson-county-ky-traffic-web-cameras-2b335
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Jefferson County
Description
TRIMARC (Traffic Response and Incident Management Assisting the River City) camera locations in Louisville Metro Kentucky. This feature layer was created from a TRIMARC JSON files of camera locations. This item includes description, direction, and videos links and is used in the Louisville Metro Snow Map. The cameras are used to monitor the roadways and verify incidents to assist in freeway and incident management This feature is a static extract and will be reviewed before each snow season for updates. For more information on this feature layer and it's use please contact Louisville Metro GIS or LOJIC. To learn more about TRIMARC please visit the following website http://www.trimarc.org.
S
Free Website Traffic Distribution Metrics 2025
sparktraffic.com
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilien Dambon (2024). Free Website Traffic Distribution Metrics 2025 [Dataset]. https://www.sparktraffic.com/blog/how-to-get-free-traffic
Explore at:
Dataset updated
Jan 1, 2024
Authors
Cecilien Dambon
Variables measured
Renewal methodology, Project creation limits, Credit system parameters, Domain eligibility criteria, Monthly free hits allocation
Measurement technique
Automated traffic distribution system
Description
Dataset containing metrics and parameters for free website traffic distribution, including Nano credit system details, eligibility criteria (6000 hits/month, domain restrictions), and manual renewal requirements.
Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Homan, Sophia
Honig, Joshua
Ferrell, Nathan
Moran, Madeline
Chan-Tin, Eric
Soni, Shreena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Z
Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...
data.niaid.nih.gov
zenodo.org
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukačovič, Andrej (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
Explore at:
Dataset updated
Feb 29, 2024
Dataset provided by
Čejka, Tomáš
Hynek, Karel
Lukačovič, Andrej
Šiška, Pavel
Luxemburk, Jan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

W-2022-44

Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

Link to other CESNET datasets

https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

@article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }
Traffic signals and SCATS sites locations DCC - Dataset - data.gov.ie
data.gov.ie
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2025). Traffic signals and SCATS sites locations DCC - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/traffic-signals-and-scats-sites-locations-dcc
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SiteID : Site (signal and SCATS Site) identifier Site_Description_Cap : Site description in capital letters Site_Description_Lower: site description in lower case letters Region: refers to SCATS regional servers Lat: Geographic location (Latitude) Long : Geographic location (Longitude) Site_Type : Site type; it has two values: SCATS or Signal Site SCATS means that both SCATS detectors and traffic signals (traffic lights) are present. Signal Site value means that only traffic signals are present. .hidden { display: none }
Passive Operating System Fingerprinting Revisited - Network Flows Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Feb 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Laštovička; Martin Laštovička; Martin Husák; Martin Husák; Petr Velan; Petr Velan; Tomáš Jirsík; Tomáš Jirsík; Pavel Čeleda; Pavel Čeleda (2023). Passive Operating System Fingerprinting Revisited - Network Flows Dataset [Dataset]. http://doi.org/10.5281/zenodo.7635138
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7635138
Dataset updated
Feb 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Laštovička; Martin Laštovička; Martin Husák; Martin Husák; Petr Velan; Petr Velan; Tomáš Jirsík; Tomáš Jirsík; Pavel Čeleda; Pavel Čeleda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For the evaluation of OS fingerprinting methods, we need a dataset with the following requirements:

First, the dataset needs to be big enough to capture the variability of the data. In this case, we need many connections from different operating systems.

Second, the dataset needs to be annotated, which means that the corresponding operating system needs to be known for each network connection captured in the dataset. Therefore, we cannot just capture any network traffic for our dataset; we need to be able to determine the OS reliably.

To overcome these issues, we have decided to create the dataset from the traffic of several web servers at our university. This allows us to address the first issue by collecting traces from thousands of devices ranging from user computers and mobile phones to web crawlers and other servers. The ground truth values are obtained from the HTTP User-Agent, which resolves the second of the presented issues. Even though most traffic is encrypted, the User-Agent can be recovered from the web server logs that record every connection’s details. By correlating the IP address and timestamp of each log record to the captured traffic, we can add the ground truth to the dataset.

For this dataset, we have selected a cluster of five web servers that host 475 unique university domains for public websites. The monitoring point recording the traffic was placed at the backbone network connecting the university to the Internet.

The dataset used in this paper was collected from approximately 8 hours of university web traffic throughout a single workday. The logs were collected from Microsoft IIS web servers and converted from W3C extended logging format to JSON. The logs are referred to as web logs and are used to annotate the records generated from packet capture obtained by using a network probe tapped into the link to the Internet.

The entire dataset creation process consists of seven steps:

The packet capture was processed by the Flowmon flow exporter (https://www.flowmon.com) to obtain primary flow data containing information from TLS and HTTP protocols.

Additional statistical features were extracted using GoFlows flow exporter (https://github.com/CN-TU/go-flows).

The primary flows were filtered to remove incomplete records and network scans.

The flows from both exporters were merged together into records containing fields from both sources.

Web logs were filtered to cover the same time frame as the flow records.

Web logs were paired with the flow records based on shared properties (IP address, port, time).

The last step was to convert the User-Agent values into the operating system using a Python version of the open-source tool ua-parser (https://github.com/ua-parser/uap-python). We replaced the unstructured User-Agent string in the records with the resulting OS.

The collected and enriched flows contain 111 data fields that can be used as features for OS fingerprinting or any other data analyses. The fields grouped by their area are listed below:

basic flow properties - flow_ID;start;end;L3 PROTO;L4 PROTO;BYTES A;PACKETS A;SRC IP;DST IP;TCP flags A;SRC port;DST port;packetTotalCountforward;packetTotalCountbackward;flowDirection;flowEndReason;

IP parameters - IP ToS;maximumTTLforward;maximumTTLbackward;IPv4DontFragmentforward;IPv4DontFragmentbackward;

TCP parameters - TCP SYN Size;TCP Win Size;TCP SYN TTL;tcpTimestampFirstPacketbackward;tcpOptionWindowScaleforward;tcpOptionWindowScalebackward;tcpOptionSelectiveAckPermittedforward;tcpOptionSelectiveAckPermittedbackward;tcpOptionMaximumSegmentSizeforward;tcpOptionMaximumSegmentSizebackward;tcpOptionNoOperationforward;tcpOptionNoOperationbackward;synAckFlag;tcpTimestampFirstPacketforward;

HTTP - HTTP Request Host;URL;

User-agent - UA OS family;UA OS major;UA OS minor;UA OS patch;UA OS patch minor;

TLS - TLS_CONTENT_TYPE;TLS_HANDSHAKE_TYPE;TLS_SETUP_TIME;TLS_SERVER_VERSION;TLS_SERVER_RANDOM;TLS_SERVER_SESSION_ID;TLS_CIPHER_SUITE;TLS_ALPN;TLS_SNI;TLS_SNI_LENGTH;TLS_CLIENT_VERSION;TLS_CIPHER_SUITES;TLS_CLIENT_RANDOM;TLS_CLIENT_SESSION_ID;TLS_EXTENSION_TYPES;TLS_EXTENSION_LENGTHS;TLS_ELLIPTIC_CURVES;TLS_EC_POINT_FORMATS;TLS_CLIENT_KEY_LENGTH;TLS_ISSUER_CN;TLS_SUBJECT_CN;TLS_SUBJECT_ON;TLS_VALIDITY_NOT_BEFORE;TLS_VALIDITY_NOT_AFTER;TLS_SIGNATURE_ALG;TLS_PUBLIC_KEY_ALG;TLS_PUBLIC_KEY_LENGTH;TLS_JA3_FINGERPRINT;

Packet timings - NPM_CLIENT_NETWORK_TIME;NPM_SERVER_NETWORK_TIME;NPM_SERVER_RESPONSE_TIME;NPM_ROUND_TRIP_TIME;NPM_RESPONSE_TIMEOUTS_A;NPM_RESPONSE_TIMEOUTS_B;NPM_TCP_RETRANSMISSION_A;NPM_TCP_RETRANSMISSION_B;NPM_TCP_OUT_OF_ORDER_A;NPM_TCP_OUT_OF_ORDER_B;NPM_JITTER_DEV_A;NPM_JITTER_AVG_A;NPM_JITTER_MIN_A;NPM_JITTER_MAX_A;NPM_DELAY_DEV_A;NPM_DELAY_AVG_A;NPM_DELAY_MIN_A;NPM_DELAY_MAX_A;NPM_DELAY_HISTOGRAM_1_A;NPM_DELAY_HISTOGRAM_2_A;NPM_DELAY_HISTOGRAM_3_A;NPM_DELAY_HISTOGRAM_4_A;NPM_DELAY_HISTOGRAM_5_A;NPM_DELAY_HISTOGRAM_6_A;NPM_DELAY_HISTOGRAM_7_A;NPM_JITTER_DEV_B;NPM_JITTER_AVG_B;NPM_JITTER_MIN_B;NPM_JITTER_MAX_B;NPM_DELAY_DEV_B;NPM_DELAY_AVG_B;NPM_DELAY_MIN_B;NPM_DELAY_MAX_B;NPM_DELAY_HISTOGRAM_1_B;NPM_DELAY_HISTOGRAM_2_B;NPM_DELAY_HISTOGRAM_3_B;NPM_DELAY_HISTOGRAM_4_B;NPM_DELAY_HISTOGRAM_5_B;NPM_DELAY_HISTOGRAM_6_B;NPM_DELAY_HISTOGRAM_7_B;

ICMP - ICMP TYPE;

The details of OS distribution grouped by the OS family are summarized in the table below. The Other OS family contains records generated by web crawling bots that do not include OS information in the User-Agent.

OS Family Number of flows
Other 42474
Windows 40349
Android 10290
iOS 8840
Mac OS X 5324
Linux 1589
Ubuntu 653
Fedora 88
Chrome OS 53
Symbian OS 1
Slackware 1
Linux Mint 1
y
The Groves - Low Traffic Neighbourhood Trial - Dataset - York Open Data
data.yorkopendata.org
Updated Jul 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). The Groves - Low Traffic Neighbourhood Trial - Dataset - York Open Data [Dataset]. https://data.yorkopendata.org/dataset/the-groves-low-traffic-neighbourhood-trial
Explore at:
Dataset updated
Jul 22, 2021
License
Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Area covered
York
Description
In June 2020, the decision was taken to implement a low traffic neighbourhood trial in The Groves. For more information on the trial please visit City of York Council's website Independent monitoring and evaluation work has been commissioned by CYC to assess the impact of the trial and inform future decisions on the experimental road closures in The Groves. Part of this work uses traffic surveys which are available in this dataset which includes baseline surveys for: • the week before the start of the trial (week 1) • and the first two weeks of the trial (weeks 2 and 3). • Approx. A year after the start of the trial (included in The Groves Traffic Analysis). • Bus journey time data before and during the trial (The Groves Bus Analysis)
d
Traffic Signal Sites - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Jun 12, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Traffic Signal Sites - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/mrwa-traffic-signal-sites
Explore at:
Dataset updated
Jun 12, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The location of electronic traffic signals, designed, owned and or controlled and maintained by Main Roads Western Australia, that control vehicle and pedestrian traffic at an intersection or on a road are identified in this data set. The signal can be red, yellow, green or white light displays, and can include circular and arrow signals, pedestrian signals, bicycle crossing signals, B (bus) signals, overhead lane control signals, and twin red or yellow signals.

This dataset was developed to identify the location of Main Roads' controlled electronic signals across Western Australia and assist in the management of this asset. Additionally, it records attribute information which includes the LM No (Asset ID.), Service Status, Signal Type, Intersection Name and Intersection Description. Note that you are accessing this data pursuant to a Creative Commons (Attribution) Licence which has a disclaimer of warranties and limitation of liability. You accept that the data provided pursuant to the Licence is subject to changes.Pursuant to section 3 of the Licence you are provided with the following notice to be included when you Share the Licenced Material:- The Commissioner of Main Roads is the creator and owner of the data and Licenced Material, which is accessed pursuant to a Creative Commons (Attribution) Licence, which has a disclaimer of warranties and limitation of liability.Creative Commons CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
d
SDOT GIS Datasets
catalog.data.gov
data.seattle.gov
+2more
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.seattle.gov (2025). SDOT GIS Datasets [Dataset]. https://catalog.data.gov/dataset/sdot-gis-datasets-e011c
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
data.seattle.gov
Description
The City of Seattle Transportation GIS Datasets | https://data-seattlecitygis.opendata.arcgis.com/datasets?t=transportation | Lifecycle status: Production | Purpose: to enable open access to SDOT GIS data. This website includes over 60 transportation-related GIS datasets from categories such as parking, transit, pedestrian, bicycle, and roadway assets. | PDDL: https://opendatacommons.org/licenses/pddl/ | The City of Seattle makes no representation or warranty as to its accuracy. The City of Seattle has created this service for our GIS Open Data website. We do reserve the right to alter, suspend, re-host, or retire this service at any time and without notice. | Datasets: 2007 Traffic Flow Counts, 2008 Traffic Flow Counts, 2009 Traffic Flow Counts, 2010 Traffic Flow Counts, 2011 Traffic Flow Counts, 2012 Traffic Flow Counts, 2013 Traffic Flow Counts, 2014 Traffic Flow Counts, 2015 Traffic Flow Counts, 2016 Traffic Flow Counts, 2017 Traffic Flow Counts, 2018 Traffic Flow Counts, Areaways, Bike Racks, Blockface, Bridges, Channelization File Geodatabase, Collisions, Crash Cushions, Curb Ramps, dotMaps Active Projects, Dynamic Message Signs, Existing Bike Facilities, Freight Network, Greater Downtown Alleys, Guardrails, High Impact Areas, Intersections, Marked Crosswalks, One-Way Streets, Paid Area Curbspaces, Pavement Moratoriums, Pay Stations, Peak Hour Parking Restrictions, Planned Bike Facilities, Public Garages or Parking Lots, Radar Speed Signs, Restricted Parking Zone (RPZ) Program, Retaining Walls, SDOT Capital Projects Input, Seattle On Street Paid Parking-Daytime Rates, Seattle On Street Paid Parking-Evening Rates, Seattle On Street Paid Parking-Morning Rates, Seattle Streets, SidewalkObservations, Sidewalks, Snow Ice Routes, Stairways, Street Design Concept Plans, Street Ends (Shoreline), Street Furnishings, Street Signs, Street Use Permits Use Addresses, Streetcar Lines, Streetcar Stations, Traffic Beacons, Traffic Cameras, Traffic Circles, Traffic Detectors, Traffic Lanes, Traffic Signals, Transit Classification, Trees.