Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Channel-specific and industry-specific Cost Per Visit benchmarks for traffic acquisition optimization
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Full paper is available on arXiv
The city-traffic-M and city-traffic-L datasets are large-scale benchmarks for fine-grained urban traffic forecasting. Unlike prior datasets that rely on sparse fixed-point sensors located on highways, our datasets are derived from GPS traces of vehicles aggregated at the level of road segments across two major metropolitan areas.
city-traffic-L covers 94K road segments, while city-traffic-M covers 53K. Datasets contain both dynamic traffic variables (traffic speed and volume at 5-minute intervals) and static road attributes (26 per segment). Road connectivity is provided using the actual directed graph of urban streets rather than heuristics based on distances.
The temporal coverage spans July 1st, 2024 – November 1st, 2024 at 5-minute resolution, resulting in over 35,000 high-frequency timesteps. city-traffic is therefore 1–2 orders of magnitude larger than previous public benchmarks, and it enables more realistic modeling of urban traffic dynamics, which are significantly richer and more complex than highway traffic.
These datasets are designed to support research in spatiotemporal graph learning, traffic forecasting, urban computing, and smart city applications. In the paper, we also benchmarked a wide range of neural forecasting models, demonstrating the scalability challenges posed by data of this magnitude and motivating the need for efficient architectures
We provide the data in two formats:
We provide raw data without any preprocessing. This type of files has prefix city_[M|L]*.parquet. For each city, four dataframes are available:
- Edges (road connectivity as an edge list, city_[M|L]_raw_edges.parquet) - contains E rows and 2 columns (source, destination, where E is the number of edges. The node index here corresponds to node index in speed and volume data frames and to the the row index for static features;
- Static features - contains 26 attributes per road segment, city_[M|L]_static_features.parquet;
- Volume - contains traffic flow at 5-min intervals, city_[M|L]_raw_volume.parquet;
- Speed contains traffic speed at 5-min intervals, city_[M|L]_raw_speed.parquet.
Note that we apply ordinal encoding to the speed_limit feature. Thus, we provide the correspondence of particular feature values and their ordinal codes:
| Speed limit | Encoding |
|---|---|
| NaN | 0 |
| 5 km/h | 1 |
| 20 km/h | 2 |
| 30 km/h | 3 |
| 40 km/h | 4 |
| 50 km/h | 5 |
| 60 km/h | 6 |
| 70 km/h | 7 |
| 80 km/h | 8 |
| 90 km/h | 9 |
| 100 km/h | 10 |
| 110 km/h | 11 |
This data is suitable for direct use with our released framework (see next section) and for reproducibility. Each .npz file corresponds to one city–target pair (City-M / City-L × speed / volume). These files can be directly loaded in our codebase for benchmarking. Full description of the fields is available in the "Datasets Specification" in the official repository README.
Table representing the corresponding filename for each combination of city and target being used:
|Target\City | city-traffic-L | city-traffic-M |
| --- | --- | --- |
| Traffic Speed | city_traffic_l_speed.npz |city_traffic_m_speed.npz |
| Traffic Volume | city_traffic_l_volume.npz |city_traffic_m_volume.npz |
Alongside the datasets, we provide an official repository. It includes benchmark implementations of all models evaluated in the paper, experiment templates for reproducibility, and complete instructions on how to run experiments with preprocessed datasets and how to organize the data in the needed format. With the provided .npz preprocessed files, all experiments can be launched seamlessly following the demonstration commands in the repository.
The repository is available following this link - https://github.com/yandex-research/urban-traffic-benchmark.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IoMT-TrafficData dataset has been developed to benchmark Machine Learning models for Intrusion Detection Systems (IDS) in the Internet of Medical Things (IoMT). The dataset simulates real-world attacks and normal network behavior in IoT and IoMT environments to enhance medical device security and patient data protection.
The dataset and its benchmarking methodology are detailed in the research article.
If you use this dataset, please credit the original authors:
Areia, J., Bispo, I. A., Santos, L., & Costa, R. L. (2023). IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things.
IEEE Access. DOI: 10.1109/ACCESS.2024.3437214
Zenodo DOI: 10.5281/zenodo.8116338
Original Source: Zenodo (Creative Commons Attribution 4.0 International License)
| Feature | Meaning |
|---|---|
| btle.advertising_header | BLE Advertising Packet Header |
| btle.advertising_header.ch_sel | Channel Selection Algorithm |
| btle.advertising_header.length | Advertising Length |
| btle.advertising_header.pdu_type | Advertising PDU Type |
| nordic_ble.crcok | Indicates if CRC is Correct |
| nordic_ble.packet_time | Packet time (start to end) |
| nordic_ble.phy | PHY |
| ... | (see Zenodo for full feature list) |
| Feature | Meaning |
|---|---|
| http.content_length | Length of HTTP response content |
| tcp.analysis.initial_rtt | Initial round-trip time for TCP |
| tcp.flags.syn | SYN flag in TCP |
| arp.isannouncement | Indicates ARP announcement |
| ... | (see Zenodo for full list) |
| Feature | Meaning |
|---|---|
| proto | Transport layer protocol |
| service | Application protocol |
| orig_bytes | Originator payload bytes |
| resp_bytes | Responder payload bytes |
| flow_duration | Duration of the flow |
| fwd_pkts_per_sec | Forward packets per second |
| flow_iat | Flow inter-arrival time |
| ... | (see Zenodo for full list) |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterA. SUMMARY This dataset contains the underlying data for the Vision Zero Benchmarking website. Vision Zero is the collaborative, citywide effort to end traffic fatalities in San Francisco. The goal of this benchmarking effort is to provide context to San Francisco’s work and progress on key Vision Zero metrics alongside its peers. The Controller's Office City Performance team collaborated with the San Francisco Municipal Transportation Agency, the San Francisco Department of Public Health, the San Francisco Police Department, and other stakeholders on this project. B. HOW THE DATASET IS CREATED The Vision Zero Benchmarking website has seven major metrics. The City Performance team collected the data for each metric separately, cleaned it, and visualized it on the website. This dataset has all seven metrics and some additional underlying data. The majority of the data is available through public sources, but a few data points came from the peer cities themselves. C. UPDATE PROCESS This dataset is for historical purposes only and will not be updated. To explore more recent data, visit the source website for the relevant metrics. D. HOW TO USE THIS DATASET This dataset contains all of the Vision Zero Benchmarking metrics. Filter for the metric of interest, then explore the data. Where applicable, datasets already include a total. For example, under the Fatalities metric, the "Total Fatalities" category within the metric shows the total fatalities in that city. Any calculations should be reviewed to not double-count data with this total. E. RELATED DATASETS N/A
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Benchmark Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP
This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ai-agent-benchmark.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers from relevant fields to participate: The competition is designed to allow for participation without special domain knowledge. Our benchmark has the following properties:
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains network traffic data collected from Internet of Things (IoT) devices operating in environments affected by botnet attacks. With over 700,000 rows, the dataset provides a comprehensive view of both benign and malicious network activity, making it ideal for building and evaluating Intrusion Detection Systems (IDS) and machine learning models for cybersecurity.
Each row represents a single network connection or flow and includes various statistical and behavioral features. The data is labeled to support both binary (benign vs attack) and multi-class (attack categories and subcategories) classification tasks.
Column Description: - pkSeqID: Unique packet sequence ID - proto: Protocol used (e.g., TCP, UDP, ICMP) - saddr: Source IP address - sport: Source port - daddr: Destination IP address - dport: Destination port - seq: Sequence number of the packet - stddev: Standard deviation of inter-arrival time - N_IN_Conn_P_SrcIP: Number of inbound connections per source IP - min: Minimum packet size or inter-arrival time - state_number: Encoded representation of connection state - mean: Mean packet size or inter-arrival time - N_IN_Conn_P_DstIP: Number of inbound connections per destination IP - drate: Data rate of the connection - srate: Session rate or flow creation rate - max: Maximum packet size or inter-arrival time - attack: Binary label (1 = attack, 0 = benign) - category: Broad category of the attack (e.g., DDoS, PortScan) - subcategory: Specific method or type of attack (e.g., SYN Flood, Mirai-UDP)
Use Cases - Build and test machine learning models for botnet detection in IoT networks - Conduct research on anomaly detection, attack classification, and network behavior modeling - Develop and benchmark intrusion detection systems - Explore traffic trends and feature engineering for cybersecurity solutions
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the official website for downloading the CA sub-dataset of the LargeST benchmark dataset. There are a total of 7 files in this page. Among them, 5 files in .h5 format contain the traffic flow raw data from 2017 to 2021, 1 file in .csv format provides the metadata for sensors, and 1 file in .npy format represents the adjacency matrix constructed based on road network distances. Please refer to https://github.com/liuxu77/LargeST for more information.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is pre-processed from the German Traffic Sign Recognition Benchmark (GTSRB) dataset. The original data include street-view photos of 43 different German traffic signs. As the photo images capture scenes larger than the traffic signs, the original dataset also provides coordinates to locate traffic sign within each image. This dataset is the result of cropping images with these provided coordinates, and the process is described in the attached jupyter notebook (german-traffic-signs-preprocessing.ipynb).
This dataset is built from the data made available at: https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign. The original source is INI Benchmark (http://benchmark.ini.rub.de/?section=gtsrb) and its website provides detailed description on the dataset.
Facebook
TwitterOnline conversion rates of e-commerce sites were the highest in the skincare sector, at ****percent in the second quarter of 2025. Food & beverage followed, with a *** percent conversion rate. For comparison, the average conversion rate of e-commerce sites across all selected sectors stood at *** percent. How does conversion vary by region and device? The conversion rate, which indicates the proportion of visits to e-commerce websites that result in purchases, varies by country and region. For instance, since at least 2023, e-commerce sites have consistently recorded higher conversion rates among shoppers in Great Britain compared to those in the United States and other global regions. Furthermore, despite the increasing prevalence of mobile shopping worldwide, conversions remain more pronounced on larger screens such as tablets and desktops. Online shopping cart abandonment on the rise Recently, the rate at which consumers abandon their online shopping carts has been gradually rising to more than ** percent in 2025, showing a higher difficulty for e-commerce sites to convert website traffic into purchases. In 2024, food and beverage was one of the product categories with the lowest online cart abandonment rate, confirming the sector’s relatively high conversion rate. In the United States, the primary reason why customers abandoned their shopping carts is that extra costs such as shipping, tax, and service fees were too high at checkout.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
| Feature | Meaning |
| btle.advertising_header | BLE Advertising Packet Header |
| btle.advertising_header.ch_sel | BLE Advertising Channel Selection Algorithm |
| btle.advertising_header.length | BLE Advertising Length |
| btle.advertising_header.pdu_type | BLE Advertising PDU Type |
| btle.advertising_header.randomized_rx | BLE Advertising Rx Address |
| btle.advertising_header.randomized_tx | BLE Advertising Tx Address |
| btle.advertising_header.rfu.1 | Reserved For Future 1 |
| btle.advertising_header.rfu.2 | Reserved For Future 2 |
| btle.advertising_header.rfu.3 | Reserved For Future 3 |
| btle.advertising_header.rfu.4 | Reserved For Future 4 |
| btle.control.instant | Instant Value Within a BLE Control Packet |
| btle.crc.incorrect | Incorrect CRC |
| btle.extended_advertising | Advertiser Data Information |
| btle.extended_advertising.did | Advertiser Data Identifier |
| btle.extended_advertising.sid | Advertiser Set Identifier |
| btle.length | BLE Length |
| frame.cap_len | Frame Length Stored Into the Capture File |
| frame.interface_id | Interface ID |
| frame.len | Frame Length Wire |
| nordic_ble.board_id | Board ID |
| nordic_ble.channel | Channel Index |
| nordic_ble.crcok | Indicates if CRC is Correct |
| nordic_ble.flags | Flags |
| nordic_ble.packet_counter | Packet Counter |
| nordic_ble.packet_time | Packet time (start to end) |
| nordic_ble.phy | PHY |
| nordic_ble.protover | Protocol Version |
Identified Key Features Within IP-Based Packets Dataset
| Feature | Meaning |
| http.content_length | Length of content in an HTTP response |
| http.request | HTTP request being made |
| http.response.code | Sequential number of an HTTP response |
| http.response_number | Sequential number of an HTTP response |
| http.time | Time taken for an HTTP transaction |
| tcp.analysis.initial_rtt | Initial round-trip time for TCP connection |
| tcp.connection.fin | TCP connection termination with a FIN flag |
| tcp.connection.syn | TCP connection initiation with SYN flag |
| tcp.connection.synack | TCP connection establishment with SYN-ACK flags |
| tcp.flags.cwr | Congestion Window Reduced flag in TCP |
| tcp.flags.ecn | Explicit Congestion Notification flag in TCP |
| tcp.flags.fin | FIN flag in TCP |
| tcp.flags.ns | Nonce Sum flag in TCP |
| tcp.flags.res | Reserved flags in TCP |
| tcp.flags.syn | SYN flag in TCP |
| tcp.flags.urg | Urgent flag in TCP |
| tcp.urgent_pointer | Pointer to urgent data in TCP |
| ip.frag_offset | Fragment offset in IP packets |
| eth.dst.ig | Ethernet destination is in the internal network group |
| eth.src.ig | Ethernet source is in the internal network group |
| eth.src.lg | Ethernet source is in the local network group |
| eth.src_not_group | Ethernet source is not in any network group |
| arp.isannouncement | Indicates if an ARP message is an announcement |
Identified Key Features Within IP-Based Flows Dataset
| Feature | Meaning |
| proto | Transport layer protocol of the connection |
| service | Identification of an application protocol |
| orig_bytes | Originator payload bytes |
| resp_bytes | Responder payload bytes |
| history | Connection state history |
| orig_pkts | Originator sent packets |
| resp_pkts | Responder sent packets |
| flow_duration | Length of the flow in seconds |
| fwd_pkts_tot | Forward packets total |
| bwd_pkts_tot | Backward packets total |
| fwd_data_pkts_tot | Forward data packets total |
| bwd_data_pkts_tot | Backward data packets total |
| fwd_pkts_per_sec | Forward packets per second |
| bwd_pkts_per_sec | Backward packets per second |
| flow_pkts_per_sec | Flow packets per second |
| fwd_header_size | Forward header bytes |
| bwd_header_size | Backward header bytes |
| fwd_pkts_payload | Forward payload bytes |
| bwd_pkts_payload | Backward payload bytes |
| flow_pkts_payload | Flow payload bytes |
| fwd_iat | Forward inter-arrival time |
| bwd_iat | Backward inter-arrival time |
| flow_iat | Flow inter-arrival time |
| active | Flow active duration |
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset is taken from 'https://benchmark.ini.rub.de/gtsdb_dataset.html' For convenience purpose the raw data available on the website is processed for easy understanding and usage!!
The images are converted from '.ppm' to '.jpg' already so no extra work required! Also the Test Image's labels are relatively hard to find and configure from the website so I did all of that and packed it into this dataset.
There are a total 900 images in the dataset, out of which 600 are allotted for training and the remaining 300 for testing.
In case you wondering why there are less files for labels with respect to the images, that is because not all images contain traffic signs. So for the images with no traffic signs the text file is not present and is completely fine if you're training a YOLO model as its made to handle images with no labels. it will ignore the images with no corresponding label file.
HAPPY LEARNING!!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduced by Hong et al. in CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions. (IJCNN 2024 Oral), here is the GitHub Link.
The CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) is an existing dataset for traffic sign detection. It consists of nearly 20,000 images of Chinese road traffic scenes, including around 40,000 annotated images of traffic signs. While most scenes in the dataset are captured under natural weather conditions, challenges include foggy, rainy, and blurry perspectives. To facilitate our research, we created a dataset called CCTSDB-AUG based on CCTSDB. This augmented dataset includes images with foggy, rainy, and blurry perspectives. We applied random haze, raindrop, and motion blur effects to generate these augmented images, simulating real-world extreme conditions. Image augmentation was performed using the Albumentations library in Python, allowing us to construct images with various weather effects. The CCTSDB-AUG dataset contains images with different extreme conditions,. These extreme conditions are proportionally represented throughout the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this task was to determine if Virtual Reality-based captured behavioral data on responses to notifications are similar to what is expected in real-world settings. For this purpose, a real-world bench mark experiment was designed to capture participant response times to wearable watch alarms triggered upon simulated traffic near the mobile work zone on the experiment site in an urban setting. The proposed scope of data collection of the real-world study included the external environmental factors (e.g., site accessibility, weather). The key parameters of research are defined as reaction time to received alarms and the heart rate measures. Table 1 provides the list of parameters that were controlled and measured during the experiments.
Table 1. Key parameters measured and tracked during real-world experiments
Variable name
Descriptions
Key parameters captured
Reaction time
The time that one takes from getting the haptic or sound alarm from a wearable alarm device, herein referring to the apple watch, to the point when the participant gives a response by stopping the alarm by pressing on the screen of the smartwatch
Inter-beat interval (IBI, heart rate)
The time interval between individual beats of the heart; the data is measured by using E4 application provided by Empatica
External factors tracked
Ambient noise
The level of ambient noise in the area is a factor potentially influencing participants’ reactions and is considered in the experiment design
Temperature
Daytime temperature recorded at each experiment
Number of pedestrians on site
Number of participants counted during the time of the experiment to record on the varying factors in the external environment in real-world settings
In the experiment, each participant was asked to participate in the experiment three times. In each trial, data was recorded separately for each alarm sent to smartwatch from the administrator at triggering events (precisely, every time the remote-controlled toy car reaches the line 30 ft apart from the designated work area). Each alarm signal at each trial was recorded for all 31 participants to the experiment. Timestamps are automatically recorded in server in the events recorded in the format of Table 2:
Table 2. Format of raw data stored in the server, starting in December 2022.
Timestamp
From
Event
0
2022-12-08 13:37:53.101391
VR
Received car approaching alert, mode=3, id=1000
1
2022-12-08 15:53:05.098288
Watch
Start Simulation
2
2022-12-08 15:53:07.437488
VR
Received car approaching alert, mode=4, id=1004
3
2022-12-08 15:53:13.064067
Watch
Stop Simulation
4
2022-12-08 15:53:13.163635
Watch
Stop Simulation
...
2417
2023-03-03 16:17:46.166644
Watch
1398
2418
2023-03-03 16:18:00.004425
Watch
1398
2419
2023-03-03 16:18.01.272071
Watch
1398
2420
2023-03-03 16:18:07.359187
Watch
Stop Simulation
2421
2023-03-03 16:18:07.388183
Watch
Stop Simulation
Some intervals used different timestamps as benchmarks to calibrate on the vehicle speed and user response time to the alarm signals, which include the following cases:
1) At the beginning of each trial, vehicle travels 70 ft from start point to the 30 ft apart point, when the first alarm is signaled; given this travel distance, the travel time of the first trip the toy vehicle makes is calculated by subtracting tn_alarm1_sent from tn_start.
2) Similarly, user response times to all alarms are recorded by subtracting the timestamps when the alarm is received by participant from when the alarm is sent from the server. (tn_alarmn_sent - tn_alarmn_received)
Ambient noise level data were collected using a noise meter, allowing to save noise level by seconds to multiple seconds (i.e., 5, 10, 30, 60 seconds). All noise data recorded were recorded in the interval of one second using the meter. The collected data was processed to match the certain timestamps collected for user response time data collected in the experiment to allow comparisons and correlation analysis to be performed later on, which include the following: 1) worker response; 2) sending of alarm signals; 3) start and stop of experiments. All data points were later modified using the rolling mean function of pandas python module to replace the missing data points by moving average method.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Channel-specific and industry-specific Cost Per Visit benchmarks for traffic acquisition optimization