Context There's a story behind every dataset and here's your opportunity to share yours.
Content What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online conversion rates of e-commerce sites were the highest in the beauty & skincare sector, at ***** percent in the first quarter of 2025. Food & beverage followed, with a *** percent conversion rate. For comparison, the average conversion rate of e-commerce sites across all selected sectors stood at *** percent. How does conversion vary by region and device? The conversion rate, which indicates the proportion of visits to e-commerce websites that result in purchases, varies by country and region. For instance, since at least 2023, e-commerce sites have consistently recorded higher conversion rates among shoppers in Great Britain compared to those in the United States and other global regions. Furthermore, despite the increasing prevalence of mobile shopping worldwide, conversions remain more pronounced on larger screens such as tablets and desktops. Online shopping cart abandonment on the rise Recently, the rate at which consumers abandon their online shopping carts has been gradually rising to more than ** percent in 2025, showing a higher difficulty for e-commerce sites to convert website traffic into purchases. In 2024, food and beverage was one of the product categories with the lowest online cart abandonment rate, confirming the sector’s relatively high conversion rate. In the United States, the primary reason why customers abandoned their shopping carts is that extra costs such as shipping, tax, and service fees were too high at checkout.
A. SUMMARY This dataset contains the underlying data for the Vision Zero Benchmarking website. Vision Zero is the collaborative, citywide effort to end traffic fatalities in San Francisco. The goal of this benchmarking effort is to provide context to San Francisco’s work and progress on key Vision Zero metrics alongside its peers. The Controller's Office City Performance team collaborated with the San Francisco Municipal Transportation Agency, the San Francisco Department of Public Health, the San Francisco Police Department, and other stakeholders on this project. B. HOW THE DATASET IS CREATED The Vision Zero Benchmarking website has seven major metrics. The City Performance team collected the data for each metric separately, cleaned it, and visualized it on the website. This dataset has all seven metrics and some additional underlying data. The majority of the data is available through public sources, but a few data points came from the peer cities themselves. C. UPDATE PROCESS This dataset is for historical purposes only and will not be updated. To explore more recent data, visit the source website for the relevant metrics. D. HOW TO USE THIS DATASET This dataset contains all of the Vision Zero Benchmarking metrics. Filter for the metric of interest, then explore the data. Where applicable, datasets already include a total. For example, under the Fatalities metric, the "Total Fatalities" category within the metric shows the total fatalities in that city. Any calculations should be reviewed to not double-count data with this total. E. RELATED DATASETS N/A
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Benchmark Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP
This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ai-agent-benchmark.
Abstract: The task for this dataset is to forecast the spatio-temporal traffic volume based on the historical traffic volume and other features in neighboring locations.
Data Set Characteristics | Number of Instances | Area | Attribute Characteristics | Number of Attributes | Date Donated | Associated Tasks | Missing Values |
---|---|---|---|---|---|---|---|
Multivariate | 2101 | Computer | Real | 47 | 2020-11-17 | Regression | N/A |
Source: Liang Zhao, liang.zhao '@' emory.edu, Emory University.
Data Set Information: The task for this dataset is to forecast the spatio-temporal traffic volume based on the historical traffic volume and other features in neighboring locations. Specifically, the traffic volume is measured every 15 minutes at 36 sensor locations along two major highways in Northern Virginia/Washington D.C. capital region. The 47 features include: 1) the historical sequence of traffic volume sensed during the 10 most recent sample points (10 features), 2) week day (7 features), 3) hour of day (24 features), 4) road direction (4 features), 5) number of lanes (1 feature), and 6) name of the road (1 feature). The goal is to predict the traffic volume 15 minutes into the future for all sensor locations. With a given road network, we know the spatial connectivity between sensor locations. For the detailed data information, please refer to the file README.docx.
Attribute Information: The 47 features include: (1) the historical sequence of traffic volume sensed during the 10 most recent sample points (10 features), (2) week day (7 features), (3) hour of day (24 features), (4) road direction (4 features), (5) number of lanes (1 feature), and (6) name of the road (1 feature).
Relevant Papers: Liang Zhao, Olga Gkountouna, and Dieter Pfoser. 2019. Spatial Auto-regressive Dependency Interpretable Learning Based on Spatial Topological Constraints. ACM Trans. Spatial Algorithms Syst. 5, 3, Article 19 (August 2019), 28 pages. DOI:[Web Link]
Citation Request: To use these datasets, please cite the papers:
Liang Zhao, Olga Gkountouna, and Dieter Pfoser. 2019. Spatial Auto-regressive Dependency Interpretable Learning Based on Spatial Topological Constraints. ACM Trans. Spatial Algorithms Syst. 5, 3, Article 19 (August 2019), 28 pages. DOI:[Web Link]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.
Please do cite the aforementioned article when using this dataset.
The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.
The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.
To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.
This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.
Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.
Identified Key Features Within Bluetooth Dataset
Feature | Meaning |
btle.advertising_header | BLE Advertising Packet Header |
btle.advertising_header.ch_sel | BLE Advertising Channel Selection Algorithm |
btle.advertising_header.length | BLE Advertising Length |
btle.advertising_header.pdu_type | BLE Advertising PDU Type |
btle.advertising_header.randomized_rx | BLE Advertising Rx Address |
btle.advertising_header.randomized_tx | BLE Advertising Tx Address |
btle.advertising_header.rfu.1 | Reserved For Future 1 |
btle.advertising_header.rfu.2 | Reserved For Future 2 |
btle.advertising_header.rfu.3 | Reserved For Future 3 |
btle.advertising_header.rfu.4 | Reserved For Future 4 |
btle.control.instant | Instant Value Within a BLE Control Packet |
btle.crc.incorrect | Incorrect CRC |
btle.extended_advertising | Advertiser Data Information |
btle.extended_advertising.did | Advertiser Data Identifier |
btle.extended_advertising.sid | Advertiser Set Identifier |
btle.length | BLE Length |
frame.cap_len | Frame Length Stored Into the Capture File |
frame.interface_id | Interface ID |
frame.len | Frame Length Wire |
nordic_ble.board_id | Board ID |
nordic_ble.channel | Channel Index |
nordic_ble.crcok | Indicates if CRC is Correct |
nordic_ble.flags | Flags |
nordic_ble.packet_counter | Packet Counter |
nordic_ble.packet_time | Packet time (start to end) |
nordic_ble.phy | PHY |
nordic_ble.protover | Protocol Version |
Identified Key Features Within IP-Based Packets Dataset
Feature | Meaning |
http.content_length | Length of content in an HTTP response |
http.request | HTTP request being made |
http.response.code | Sequential number of an HTTP response |
http.response_number | Sequential number of an HTTP response |
http.time | Time taken for an HTTP transaction |
tcp.analysis.initial_rtt | Initial round-trip time for TCP connection |
tcp.connection.fin | TCP connection termination with a FIN flag |
tcp.connection.syn | TCP connection initiation with SYN flag |
tcp.connection.synack | TCP connection establishment with SYN-ACK flags |
tcp.flags.cwr | Congestion Window Reduced flag in TCP |
tcp.flags.ecn | Explicit Congestion Notification flag in TCP |
tcp.flags.fin | FIN flag in TCP |
tcp.flags.ns | Nonce Sum flag in TCP |
tcp.flags.res | Reserved flags in TCP |
tcp.flags.syn | SYN flag in TCP |
tcp.flags.urg | Urgent flag in TCP |
tcp.urgent_pointer | Pointer to urgent data in TCP |
ip.frag_offset | Fragment offset in IP packets |
eth.dst.ig | Ethernet destination is in the internal network group |
eth.src.ig | Ethernet source is in the internal network group |
eth.src.lg | Ethernet source is in the local network group |
eth.src_not_group | Ethernet source is not in any network group |
arp.isannouncement | Indicates if an ARP message is an announcement |
Identified Key Features Within IP-Based Flows Dataset
Feature | Meaning |
proto | Transport layer protocol of the connection |
service | Identification of an application protocol |
orig_bytes | Originator payload bytes |
resp_bytes | Responder payload bytes |
history | Connection state history |
orig_pkts | Originator sent packets |
resp_pkts | Responder sent packets |
flow_duration | Length of the flow in seconds |
fwd_pkts_tot | Forward packets total |
bwd_pkts_tot | Backward packets total |
fwd_data_pkts_tot | Forward data packets total |
bwd_data_pkts_tot | Backward data packets total |
fwd_pkts_per_sec | Forward packets per second |
bwd_pkts_per_sec | Backward packets per second |
flow_pkts_per_sec | Flow packets per second |
fwd_header_size | Forward header bytes |
bwd_header_size | Backward header bytes |
fwd_pkts_payload | Forward payload bytes |
bwd_pkts_payload | Backward payload bytes |
flow_pkts_payload | Flow payload bytes |
fwd_iat | Forward inter-arrival time |
bwd_iat | Backward inter-arrival time |
flow_iat | Flow inter-arrival time |
active | Flow active duration |
Among selected consumer electronics retailers worldwide, apple.com recorded the highest bounce rate in April 2024, at approximately 55.3 percent. Rival samsung.com had a slightly lower bounce rate of nearly 54 percent. Among selected consumer electronics e-tailers, huawei.com had the lowest bounce rate at 30.91 percent. Bounce rate is a marketing term used in web traffic analysis reflecting the percentage of visitors who enter the site and then leave without taking any further action like making a purchase or viewing other pages within the website ("bounce"). A sector with growth potential With one of the lowest online shopping cart abandonment rates globally in 2022, consumer electronics is a burgeoning e-commerce segment that places itself at the crossroads between technological progress and digital transformation. Boosted by the pandemic-induced surge in online shopping, the global market size of consumer electronics e-commerce was estimated at more than 340 billion U.S. dollars in 2021 and forecast to nearly double less than five years later. Amazon and Apple lead the charts in electronics e-commerce With more than 59 billion U.S. dollars in e-commerce net sales in the consumer electronics segment in 2022, apple.com was the uncontested industry leader. The global powerhouse surpassed e-commerce giants amazon.com and jd.com with more than ten billion U.S. dollars difference in online sales in the consumer electronics category.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this task was to determine if Virtual Reality-based captured behavioral data on responses to notifications are similar to what is expected in real-world settings. For this purpose, a real-world bench mark experiment was designed to capture participant response times to wearable watch alarms triggered upon simulated traffic near the mobile work zone on the experiment site in an urban setting. The proposed scope of data collection of the real-world study included the external environmental factors (e.g., site accessibility, weather). The key parameters of research are defined as reaction time to received alarms and the heart rate measures. Table 1 provides the list of parameters that were controlled and measured during the experiments.
Table 1. Key parameters measured and tracked during real-world experiments
Variable name
Descriptions
Key parameters captured
Reaction time
The time that one takes from getting the haptic or sound alarm from a wearable alarm device, herein referring to the apple watch, to the point when the participant gives a response by stopping the alarm by pressing on the screen of the smartwatch
Inter-beat interval (IBI, heart rate)
The time interval between individual beats of the heart; the data is measured by using E4 application provided by Empatica
External factors tracked
Ambient noise
The level of ambient noise in the area is a factor potentially influencing participants’ reactions and is considered in the experiment design
Temperature
Daytime temperature recorded at each experiment
Number of pedestrians on site
Number of participants counted during the time of the experiment to record on the varying factors in the external environment in real-world settings
In the experiment, each participant was asked to participate in the experiment three times. In each trial, data was recorded separately for each alarm sent to smartwatch from the administrator at triggering events (precisely, every time the remote-controlled toy car reaches the line 30 ft apart from the designated work area). Each alarm signal at each trial was recorded for all 31 participants to the experiment. Timestamps are automatically recorded in server in the events recorded in the format of Table 2:
Table 2. Format of raw data stored in the server, starting in December 2022.
Timestamp
From
Event
0
2022-12-08 13:37:53.101391
VR
Received car approaching alert, mode=3, id=1000
1
2022-12-08 15:53:05.098288
Watch
Start Simulation
2
2022-12-08 15:53:07.437488
VR
Received car approaching alert, mode=4, id=1004
3
2022-12-08 15:53:13.064067
Watch
Stop Simulation
4
2022-12-08 15:53:13.163635
Watch
Stop Simulation
...
2417
2023-03-03 16:17:46.166644
Watch
1398
2418
2023-03-03 16:18:00.004425
Watch
1398
2419
2023-03-03 16:18.01.272071
Watch
1398
2420
2023-03-03 16:18:07.359187
Watch
Stop Simulation
2421
2023-03-03 16:18:07.388183
Watch
Stop Simulation
Some intervals used different timestamps as benchmarks to calibrate on the vehicle speed and user response time to the alarm signals, which include the following cases:
1) At the beginning of each trial, vehicle travels 70 ft from start point to the 30 ft apart point, when the first alarm is signaled; given this travel distance, the travel time of the first trip the toy vehicle makes is calculated by subtracting tn_alarm1_sent from tn_start.
2) Similarly, user response times to all alarms are recorded by subtracting the timestamps when the alarm is received by participant from when the alarm is sent from the server. (tn_alarmn_sent - tn_alarmn_received)
Ambient noise level data were collected using a noise meter, allowing to save noise level by seconds to multiple seconds (i.e., 5, 10, 30, 60 seconds). All noise data recorded were recorded in the interval of one second using the meter. The collected data was processed to match the certain timestamps collected for user response time data collected in the experiment to allow comparisons and correlation analysis to be performed later on, which include the following: 1) worker response; 2) sending of alarm signals; 3) start and stop of experiments. All data points were later modified using the rolling mean function of pandas python module to replace the missing data points by moving average method.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Web Bench: A real-world benchmark for Browser Agents
WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025
Dataset Composition
Category Description Example Count (% of dataset)
READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
As of September 2023, the health and beauty industry recorded the highest bounce rate compared to other e-commerce sectors. That month, health and beauty sites had a bounce rate of around 51.6 percent. The overall bounce rate for e-commerce was approximately 38.7 percent. The term "bounce rate" refers to the percentage of website visitors who leave the site after viewing a single page.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Context There's a story behind every dataset and here's your opportunity to share yours.
Content What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?