16 datasets found

Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
WebBench
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Halluminate (2025). WebBench [Dataset]. https://huggingface.co/datasets/Halluminate/WebBench
Explore at:
Dataset updated
May 28, 2025
Dataset provided by
Halluminate, Inc.
Authors
Halluminate
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Web Bench: A real-world benchmark for Browser Agents

WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025

Dataset Composition

Category Description Example Count (% of dataset)

READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
Dataset used for detecting DNS over HTTPS by Machine Learning.
zenodo.org
zip
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin (2020). Dataset used for detecting DNS over HTTPS by Machine Learning. [Dataset]. http://doi.org/10.5281/zenodo.3906526
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3906526
Dataset updated
Oct 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset consists of three different data sources:

DoH enabled Firefox

DoH enabled Google Chrome

Cloudflared DoH proxy

The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome.

The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months.

The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.

The CSV with extracted features has the following data fields:

- Label (1 - Doh, 0 - regular HTTPS)
- Data source
- Duration
- Minimal Inter-Packet Delay
- Maximal Inter-Packet Delay
- Average Inter-Packet Delay
- A variance of Incoming Packet Sizes
- A variance of Outgoing Packet Sizes
- A ratio of the number of Incoming and outgoing bytes
- A ration of the number of Incoming and outgoing packets
- Average of Incoming Packet sizes
- Average of Outgoing Packet sizes
- The median value of Incoming Packet sizes
- The median value of outgoing Packet sizes
- The ratio of bursts and pauses
- Number of bursts
- Number of pauses
- Autocorrelation
- Transmission symmetry in the 1st third of connection
- Transmission symmetry in the 2nd third of connection
- Transmission symmetry in the last third of connection

The observed network traffic does not contain privacy-sensitive information.

The zip file structure is:

|-- data | |-- extracted-features...extracted features used in ML for DoH recognition | | |-- chrome | | |-- cloudflared | | `-- firefox | |-- flows...............................................exported flow data | | |-- chrome | | |-- cloudflared | | `-- firefox | `-- pcaps....................................................raw PCAP data | |-- chrome | |-- cloudflared | `-- firefox |-- LICENSE `-- README.md

When using this dataset, please cite the original work as follows:

@inproceedings{vekshin2020, author = {Vekshin, Dmitrii and Hynek, Karel and Cejka, Tomas}, title = {DoH Insight: Detecting DNS over HTTPS by Machine Learning}, year = {2020}, isbn = {9781450388337}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3407023.3409192}, doi = {10.1145/3407023.3409192}, booktitle = {Proceedings of the 15th International Conference on Availability, Reliability and Security}, articleno = {87}, numpages = {8}, keywords = {classification, DoH, DNS over HTTPS, machine learning, detection, datasets}, location = {Virtual Event, Ireland}, series = {ARES '20} }
T
Vital Signs: Time in Congestion - Bay Area (updated October 2018)
data.bayareametro.gov
csv, xlsx, xml
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Vital Signs: Time in Congestion - Bay Area (updated October 2018) [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Time-in-Congestion-Bay-Area-updated-Oc/ja9p-vpfm
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Oct 16, 2018
Area covered
San Francisco Bay Area
Description
VITAL SIGNS INDICATOR Time Spent in Congestion (T7)

FULL MEASURE NAME Time Spent in Congestion

LAST UPDATED October 2018

DATA SOURCE MTC/Iteris Congestion Analysis No link available

CA Department of Finance Forms E-8 and E-5 http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/ http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/

CA Employment Division Department: Labor Market Information http://www.labormarketinfo.edd.ca.gov/

CONTACT INFORMATION vitalsigns.info@bayareametro.gov

METHODOLOGY NOTES (across all datasets for this indicator) Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.

This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.

To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.

MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].

Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy traffic volumes (consistent with longstanding regional methodologies used to generate the “top 10” list of congested segments). Historical Bay Area data was estimated by MTC Operations staff using a combination of internal datasets to develop an approximate trend back to 1998.

To explore how 2017 congestion trends compare to real-time congestion on the region’s freeways, visit 511.org.
Define Best Tariff for a Telecom Company
kaggle.com
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roman Nikiforov (2024). Define Best Tariff for a Telecom Company [Dataset]. https://www.kaggle.com/datasets/romanniki/prospective-tariff-for-a-telecom-company/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Roman Nikiforov
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Determining the Prospective Tariff for a Telecom Company

Project Description

You are an analyst at "Megaline," a federal mobile operator. The company offers two tariff plans to customers: "Smart" and "Ultra." To adjust the advertising budget, the commercial department wants to understand which tariff generates more revenue.

You need to conduct a preliminary analysis of the tariffs on a small sample of customers. You have data on 500 users of "Megaline": who they are, where they are from, which tariff they use, how many calls and messages they sent in 2018. You need to analyze customer behavior and conclude which tariff is better.

Tariff Descriptions

"Smart" Tariff: - Monthly fee: 550 rubles - Included: 500 minutes of calls, 50 messages, and 15 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 3 rubles (Megaline always rounds up minutes and megabytes. If the user talked for just 1 second, it counts as a whole minute); 2. Message: 3 rubles; 3. 1 GB of internet traffic: 200 rubles.

"Ultra" Tariff: - Monthly fee: 1950 rubles - Included: 3000 minutes of calls, 1000 messages, and 30 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 1 ruble; 2. Message: 1 ruble; 3. 1 GB of internet traffic: 150 rubles.

Note: Megaline always rounds up seconds to minutes and megabytes to gigabytes. Each call is rounded up individually: even if it lasted just 1 second, it is counted as 1 minute. For web traffic, separate sessions are not counted. Instead, the total amount for the month is rounded up. If a subscriber uses 1025 megabytes in a month, they are charged for 2 gigabytes.

Project Steps

Step 1: Open the file with data and study the general information File paths: - /datasets/calls.csv - /datasets/internet.csv - /datasets/messages.csv - /datasets/tariffs.csv - /datasets/users.csv

Step 2: Prepare the data - Convert data to the required types; - Find and fix errors in the data, if any. Explain what errors you found and how you fixed them. You will find calls with zero duration in the data. This is not an error: missed calls are indicated by zeros, so they do not need to be deleted.

For each user, calculate: - Number of calls made and minutes spent per month; - Number of messages sent per month; - Amount of internet traffic used per month; - Monthly revenue from each user (subtract the free limit from the total number of calls, messages, and internet traffic; multiply the remainder by the value from the tariff plan; add the corresponding tariff plan's subscription fee).

Step 3: Analyze the data Describe the behavior of the operator's customers based on the sample. How many minutes of calls, how many messages, and how much internet traffic do users of each tariff need per month? Calculate the average, variance, and standard deviation. Create histograms. Describe the distributions.

Step 4: Test hypotheses - The average revenue of users of the "Ultra" and "Smart" tariffs is different; - The average revenue of users from Moscow differs from the revenue of users from other regions. Moscow is written as 'Москва'. You can put it in your value, when check the hypothesis

Set the threshold alpha value yourself.

Explain: - How you formulated the null and alternative hypotheses; - Which criterion you used to test the hypotheses and why.

Step 5: Write a general conclusion

Formatting: Perform the task in Jupyter Notebook. Fill the program code in the cells of type code, and the textual explanations in the cells of type markdown. Apply formatting and headers.

Data Description

Table users (user information): - user_id: unique user identifier - first_name: user's first name - last_name: user's last name - age: user's age (years) - reg_date: date of tariff connection (day, month, year) - churn_date: date of tariff discontinuation (if the value is missing, the tariff was still active at the time of data extraction) - city: user's city of residence - tariff: name of the tariff plan

Table calls (call information): - id: unique call number - call_date: call date - duration: call duration in minutes - user_id: identifier of the user who made the call

Table messages (message information): - id: unique message number - message_date: message date - user_id: identifier of the user who sent the message

Table internet (internet session information): - id: unique session number - mb_used: amount of internet traffic used during the session (in megabytes) - session_date: internet session date - user_id: user identifier

Table tariffs (tariff information): - tariff_name: tariff name - rub_monthly_fee: monthly subscription fee in rubles - minutes_included: number of call minutes included per month - `messages_included...
R
Uavdet Small Gvba Dataset
universe.roboflow.com
zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow100VL Full (2025). Uavdet Small Gvba Dataset [Dataset]. https://universe.roboflow.com/roboflow100vl-full/uavdet-small-gvba/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Roboflow100VL Full
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Uavdet Small Gvba Gvba Bounding Boxes
Description
Overview

Introduction

Object Classes

Bicycle

Bus

Car

Human

Motorbike

Truck

Van

Introduction

This dataset aims to annotate various types of vehicles and pedestrians in urban environments using aerial images. The goal is to create a comprehensive object detection dataset for applications such as traffic analysis and city planning. The dataset includes the following classes: bicycle, bus, car, human, motorbike, truck, and van.

Object Classes

Bicycle

Description

A bicycle is a two-wheeled, human-powered vehicle. From an aerial view, it appears narrow with two wheels in line. It can often be seen alongside pedestrians or in bike lanes.

Instructions

Annotate the entire structure, including both wheels and the frame.

Do not include the rider as part of the bicycle annotation; the rider should be annotated as a separate human if visible.

Ensure clear visibility of both wheels and the frame; do not annotate if mostly obscured by other objects.

Bus

Description

Buses are large public transport vehicles with a box-like structure. They are larger than cars and have a distinct length and width, noticeable from above.

Instructions

Annotate the full rectangular structure, including any visible wheels.

Exclude overlapping vehicles or structures on top of the bus.

Annotate only if more than 50% of the bus is visible.

Car

Description

Cars are smaller than buses and have a compact rectangular shape with visible wheels and a roof from an aerial perspective.

Instructions

Outline the car’s body, including visible wheels.

Do not include shadows or reflections in the annotation.

Ensure the car is not overly occluded or indistinguishable from other vehicles.

Human

Description

Humans appear as small, elongated shapes from an aerial view and are often seen on sidewalks or pedestrian crossings.

Instructions

Annotate individual human figures only when clearly visible and distinct.

Exclude groups where individuals cannot be distinguished.

Avoid annotating if the figure is too small (less than 10 pixels).

Motorbike

Description

Motorbikes are narrower than cars and have a distinct two-wheel alignment. They might appear alongside or near cars and can sometimes be accompanied by a rider.

Instructions

Encompass the entire structure, but do not include riders in the motorbike annotation.

Ensure both wheels are visible; avoid annotating if mostly hidden.

Differentiate from bicycles by their typically larger size and engine presence.

Truck

Description

Trucks are large, elongated vehicles often used for transport or delivery. They are similar in shape to buses but generally have distinct cargo sections.

Instructions

Annotate the whole truck, including cab and cargo sections.

Ignore small trailers or attached equipment.

Annotate only when over half of the truck is visible.

Van

Description

Vans are mid-sized, larger than cars but smaller than trucks and buses. They have a box-like structure distinct enough to notice from above.

Instructions

Outline the van including visible wheels and roof.

Avoid overlapping annotations with nearby vehicles.

Ensure the van's shape is clear and not obscured by large objects.
i
A new large-scale index (AcED) for assessing traffic noise disturbance on...
pre.iepnb.es
iepnb.es
+1more
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). A new large-scale index (AcED) for assessing traffic noise disturbance on wildlife: stress response in a roe deer (Capreolus capreolus) population. - Dataset - CKAN [Dataset]. https://pre.iepnb.es/catalogo/dataset/a-new-large-scale-index-aced-for-assessing-traffic-noise-disturbance-on-wildlife-stress-respons1
Explore at:
Dataset updated
May 23, 2025
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Anthropogenic noise is a growing ubiquitous and pervasive pollutant as well as a recognised stressor that spreads throughout natural ecosystems. However, there is still an urgent need for the assessment of noise impact on natural ecosystems. This article presents a multidisciplinary study which made it possible to isolate noise due to road traffic to evaluate it as a major driver of detrimental effects on wildlife populations. A new indicator has been defined: AcED (the acoustic escape distance) and faecal cortisol metabolites (FCM) were extracted from roe deer faecal samples as a validated indicator of physiological stress in animals moving around in two low-traffic roads that cross a National Park in Spain. Two key findings turned out to be relevant in this study: (i) road identity (i.e. road type defined by traffic volume and average speed) and AcED were the variables that best explained the FCM values observed in roe deer, and (ii) FCM concentration was positively related to increasing traffic volume (road type) and AcED values. Our results suggest that FCM analysis and noise mapping have shown themselves to be useful tools in multidisciplinary approaches and environmental monitoring. Furthermore, our findings aroused the suspicion that low-traffic roads (< 1000 vehicles per day) could be capable of causing higher habitat degradation than has been deemed until now. Palabras clave: Disturbance, Noise, Wild boar
S
Vital Signs: Time in Congestion - Corridor Shapefile (Updated October 2018)
splitgraph.com
data.bayareametro.gov
Updated Oct 24, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bayareametro-gov (2018). Vital Signs: Time in Congestion - Corridor Shapefile (Updated October 2018) [Dataset]. https://www.splitgraph.com/bayareametro-gov/vital-signs-time-in-congestion-corridor-shapefile-j4ig-7vv6/
Explore at:
application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
Dataset updated
Oct 24, 2018
Authors
bayareametro-gov
Description
VITAL SIGNS INDICATOR

Time Spent in Congestion (T7)

FULL MEASURE NAME

Time Spent in Congestion

LAST UPDATED

October 2018

DATA SOURCE

MTC/Iteris Congestion Analysis

No link available

CA Department of Finance Forms E-8 and E-5

http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/

http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/

CA Employment Division Department: Labor Market Information

http://www.labormarketinfo.edd.ca.gov/

CONTACT INFORMATION

vitalsigns.info@bayareametro.gov

METHODOLOGY NOTES (across all datasets for this indicator)

Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.

This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.

To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.

MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].

Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy t

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
Bird Strikes in Aviation: Aircraft Collisions
kaggle.com
Updated Nov 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tapendu Karmakar (2024). Bird Strikes in Aviation: Aircraft Collisions [Dataset]. https://www.kaggle.com/datasets/iamtapendu/bird-strike-by-aircafts-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tapendu Karmakar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Transport and communication are vital domains within the field of analytics, particularly in addressing safety and environmental concerns linked to the rapid growth of urban areas and increasing air traffic. Among the many risks aviation faces, bird strikes—collisions between aircraft and birds or other wildlife—pose a significant threat. These strikes can cause serious damage to aircraft, particularly jet engines, and have been responsible for some fatal accidents. Bird strikes are most likely to occur during critical flight phases such as take-off, climb, approach, and landing, when aircraft are at lower altitudes and bird activity is higher.

The dataset provided by the FAA, covering incidents from 2000 to 2011, offers a comprehensive overview of bird strikes in the U.S. It includes detailed visualizations and analyses across several key areas:

Trends Over Time: Yearly distribution of bird strike incidents.

Airline Impact: Analysis of the top 10 U.S. airlines affected by bird strikes.

Airport Incidents: Identification of the 50 U.S. airports with the highest frequency of bird strike incidents.

Economic Impact: Yearly costs incurred by airlines and the aviation industry due to bird strikes.

Timing and Altitude: When and at what altitude most bird strikes occur.

Flight Phase: The phase of flight during which strikes are most likely to happen.

Impact Analysis: How bird strikes affect flight operations, including aircraft damage.

Pilot Awareness: Correlation between pilot knowledge of potential bird strike risks and the severity of the incidents.

This dataset offers valuable insights into bird strike patterns, focusing on factors such as aircraft type, location, flight phase, and the specific species involved. By analyzing these variables, it helps identify risk factors and trends, supporting the development of strategies to reduce the frequency and impact of bird strikes, ultimately enhancing aviation safety and risk mitigation.

Features:

AircraftType: The type of aircraft involved in the bird strike incident (e.g., "Airplane").

AirportName: The name of the airport where the bird strike occurred (e.g., "LAGUARDIA NY", "DALLAS/FORT WORTH INTL ARPT").

AltitudeBin: The altitude range (in feet) at which the bird strike occurred, divided into bins (e.g., "(1000, 2000]", "(30, 50]").

MakeModel: The specific make and model of the aircraft involved (e.g., "B-737-400", "MD-80", "A-300").

NumberStruck: The number of birds that were struck during the incident (e.g., "Over 100", "1", "26").

NumberStruckActual: The actual number of birds that were struck during the incident (e.g., 859, 424, 261).

Effect: The effect of the bird strike on the aircraft, indicating whether it caused any damage or not (e.g., "Engine Shut Down", "No damage", "Caused damage").

FlightDate: The date of the bird strike incident (e.g., "11/23/00 0:00").

Damage: A description of the damage caused by the bird strike (e.g., "Caused damage", "No damage").

Engines: The number of engines on the aircraft involved in the bird strike (e.g., 2 engines).

Operator: The airline or operator of the aircraft involved in the bird strike (e.g., "US AIRWAYS", "AMERICAN AIRLINES", "ALASKA AIRLINES").

OriginState: The U.S. state where the aircraft originated (e.g., "New York", "Texas", "Washington").

FlightPhase: The phase of flight during which the bird strike occurred (e.g., "Climb", "Landing Roll", "Approach", "Take-off run")

ConditionsPrecipitation: The weather condition related to precipitation at the time of the bird strike (e.g., "None", "Some Cloud").

RemainsCollected?: Indicates whether bird remains were collected after the strike (e.g., "True" or "False").

RemainsSentToSmithsonian: Indicates whether the bird remains were sent to the Smithsonian Institution for study (e.g., "True" or "False").

Remarks: Additional comments or notes related to the incident, including specific details like the number of birds involved, actions taken, or other observations (e.g., "FLYING UNDER A VERY LARGE FLOCK OF BIRDS", "BIRD REMAINS ON F/O WINDSCREEN").

WildlifeSize: The size of the bird or wildlife involved in the strike (e.g., "Small", "Medium").

ConditionsSky: The sky condition at the time of the bird strike (e.g., "No Cloud", "Some Cloud").

WildlifeSpecies: The species of the bird or wildlife involved in the strike (e.g., "European starling", "Rock pigeon", "Unknown bird - medium").

PilotWarned: Indicates whether the pilot was warned about the potential for a bird strike (e.g., "Y" for Yes, "N" for No).

Cost: The cost incurred as a result of the bird strike (e.g., financial cost to repair damage or related expenses, usually in monetary value like 30,736).

Altitude: The specific alt...
f
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Binyam Gebrehiwet Tesfay; Tensay Kahsay Welegebriel; Desta Hailu Aregawi; Mamush Gidey Abrha; Berhe Gebrehiwot Tewele; Fissha Brhane Mesele; Fiseha Abadi Gebreanenia; Kelali Goitom Weldu (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0308584.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0308584.s001
Dataset updated
Mar 3, 2025
Dataset provided by
PLOS ONE
Authors
Binyam Gebrehiwet Tesfay; Tensay Kahsay Welegebriel; Desta Hailu Aregawi; Mamush Gidey Abrha; Berhe Gebrehiwot Tewele; Fissha Brhane Mesele; Fiseha Abadi Gebreanenia; Kelali Goitom Weldu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGlobally, road traffic accidents (RTAs) cause over 1.35 million deaths each year, with an additional 50 million people suffering disabilities. Ethiopia has the highest number of road traffic accidents, with over 14,000 people killed and over 45,000 injured annually. This study aimed to assess survival status and predictors of mortality among road traffic accident adult patients admitted to intensive care units of Referral Hospitals in Tigray, 2024.MethodsAn institution-based retrospective follow-up study design was conducted from January 8, 2019, to December 11, 2023, on 333 patient charts. A bivariable Cox-regression analysis was performed to estimate crude hazard ratios (CHR). Subsequently, a multivariable Cox regression analysis was performed to estimate the Adjusted Hazard Ratios (AHR). Finally, AHR with p-value less than 0.05 was used to measure the association between dependent and independent variables.ResultThe incidence of mortality for road traffic accident victims, was 21 per 1000 person-days observation with (95% CI: 16, 27.6) and the median survival time was 14 days. The predictors of mortality in this study were the value of oxygen saturation on admission ≤ 89% (AHR = 4.9; 95%CI: 1.4–17.2), Intracranial hemorrhage (AHR = 3.3; 95% CI: 1.02–11), chest injury (AHR = 3.2; 95%CI: 1.38–7.59), victims with age catgories of 31–45 years (AHR = 0.3; 95% CI: 0.1–0.88) and 46–60 years (AHR = 0.22; 95% CI: 0.06–0.89).ConclusionA concerningly high mortality rate from car accidents were found in Referral Hospitals of Tigray. To improve the survival rates, healthcare providers should focus on victims with very low oxygen levels, head injuries, chest injuries, and older victims.
Road traffic fatalities per one million inhabitants in the United States...
statista.com
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2023). Road traffic fatalities per one million inhabitants in the United States 2014-2029 [Dataset]. https://www.statista.com/topics/3708/road-accidents-in-the-us/
Explore at:
Dataset updated
Dec 18, 2023
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of road traffic fatalities per one million inhabitants in the United States was forecast to continuously increase between 2024 and 2029 by in total 18.5 deaths (+13.81 percent). After the tenth consecutive increasing year, the number is estimated to reach 152.46 deaths and therefore a new peak in 2029. Depicted here are the estimated number of deaths which occured in relation to road traffic. They are set in relation to the population size and depicted as deaths per 100,000 inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road traffic fatalities per one million inhabitants in countries like Mexico and Canada.
Crash data from Queensland roads
data.qld.gov.au
data.wu.ac.at
csv
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Transport and Main Roads (2025). Crash data from Queensland roads [Dataset]. https://www.data.qld.gov.au/dataset/crash-data-from-queensland-roads
Explore at:
csv(3 MiB), csv(2 MiB), csv(1 MiB), csv(303 KiB), csv(196.5 MiB), csv(196.5 KiB)Available download formats
Dataset updated
Jun 20, 2025
Dataset provided by
Department of Transport and Main Roadshttp://tmr.qld.gov.au/
Authors
Transport and Main Roads
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Queensland
Description
Overview:

Information on location and characteristics of crashes in Queensland for all reported Road Traffic Crashes occurred from 1 January 2001 to 30 June 2024.

Fatal, Hospitalisation, Medical treatment and Minor injury:

This dataset contains information on crashes reported to the police which resulted from the movement of at least 1 road vehicle on a road or road related area. Crashes listed in this resource have occurred on a public road and meet one of the following criteria:

a person is killed or injured, or

at least 1 vehicle was towed away, or

the value of the property damage meets the appropriate criteria listed below.

Property damage:

$2500 or more damage to property other than vehicles (after 1 December 1999)

$2500 or more damage to vehicle and/or other property (after 1 December 1991 and before 1 December 1999)

value of property damage is greater than $1000 (before December 1991).

Please note:

This data has been extracted from the Queensland Road Crash Database.

Information held in the Road Crash Database on events occurring within the last 12 months is considered preliminary as investigations into crashes can take up to 1 year to finalise.

Property damage only crashes ceased to be reported/recorded by Queensland Police Service after 31 December 2010.

These crash location coordinates reference the current Australian geodetic datum is GDA2020 (previously it was GDA94).
Road safety statistics: data tables
gov.uk
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2025). Road safety statistics: data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/reported-road-accidents-vehicles-and-casualties-tables-for-great-britain
Explore at:
Dataset updated
Jul 31, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description

These tables present high-level breakdowns and time series. A list of all tables, including those discontinued, is available in the table index. More detailed data is available in our data tools, or by downloading the open dataset.

Latest data and table index

The tables below are the latest final annual statistics for 2023. The latest data currently available are provisional figures for 2024. These are available from the latest provisional statistics.

A list of all reported road collisions and casualties data tables and variables in our data download tool is available in the https://assets.publishing.service.gov.uk/media/683709928ade4d13a63236df/reported-road-casualties-gb-index-of-tables.ods">Tables index (ODS, 30.1 KB).

All collision, casualty and vehicle tables

https://assets.publishing.service.gov.uk/media/66f44e29c71e42688b65ec43/ras-all-tables-excel.zip">Reported road collisions and casualties data tables (zip file) (ZIP, 16.6 MB)

Historic trends (RAS01)

RAS0101: https://assets.publishing.service.gov.uk/media/66f44bd130536cb927482733/ras0101.ods">Collisions, casualties and vehicles involved by road user type since 1926 (ODS, 52.1 KB)

RAS0102: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ec/ras0102.ods">Casualties and casualty rates, by road user type and age group, since 1979 (ODS, 142 KB)

Road user type (RAS02)

RAS0201: https://assets.publishing.service.gov.uk/media/66f44bd1a31f45a9c765ec1f/ras0201.ods">Numbers and rates (ODS, 60.7 KB)

RAS0202: https://assets.publishing.service.gov.uk/media/66f44bd1e84ae1fd8592e8f0/ras0202.ods">Sex and age group (ODS, 167 KB)

RAS0203: https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods">Rates by mode, including air, water and rail modes (ODS, 24.2 KB)

Road type (RAS03)

RAS0301: https://assets.publishing.service.gov.uk/media/66f44bd1c71e42688b65ec3e/ras0301.ods">Speed limit, built-up and non-built-up roads (ODS, 49.3 KB)

RAS0302: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ee/ras0302.ods">Urban and rural roa
Air passenger traffic at Canadian airports, annual
www150.statcan.gc.ca
open.canada.ca
+2more
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Air passenger traffic at Canadian airports, annual [Dataset]. http://doi.org/10.25318/2310025301-eng
Explore at:
Unique identifier
https://doi.org/10.25318/2310025301-eng
Dataset updated
Jul 29, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Passengers enplaned and deplaned at Canadian airports, annual.
Number of road accidents per one million inhabitants in the United States...
statista.com
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2023). Number of road accidents per one million inhabitants in the United States 2014-2029 [Dataset]. https://www.statista.com/topics/3708/road-accidents-in-the-us/
Explore at:
Dataset updated
Dec 18, 2023
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of road accidents per one million inhabitants in the United States was forecast to continuously decrease between 2024 and 2029 by in total 2,490.4 accidents (-14.99 percent). After the eighth consecutive decreasing year, the number is estimated to reach 14,118.78 accidents and therefore a new minimum in 2029. Depicted here are the estimated number of accidents which occured in relation to road traffic. They are set in relation to the population size and depicted as accidents per one million inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road accidents per one million inhabitants in countries like Mexico and Canada.
Number of households with internet access in Indonesia 2014-2029
statista.com
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of households with internet access in Indonesia 2014-2029 [Dataset]. https://www.statista.com/topics/2431/internet-usage-in-indonesia/
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
Indonesia
Description
The number of households with internet access in Indonesia was forecast to continuously increase between 2024 and 2029 by in total 3.8 million households (+6.49 percent). After the fifteenth consecutive increasing year, the number of households is estimated to reach 62.36 million households and therefore a new peak in 2029. Notably, the number of households with internet access of was continuously increasing over the past years.Depicted is the number of housholds with internet access in the country or region at hand.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of households with internet access in countries like Singapore and Vietnam.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset

Network Traffic Dataset

Use this Dataset for analysis the network traffic and designing the applications

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 31, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ravikumar Gattu

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

Clear search

Close search

Google apps

Main menu

Network Traffic Dataset

WebBench

Dataset used for detecting DNS over HTTPS by Machine Learning.

Vital Signs: Time in Congestion - Bay Area (updated October 2018)

Define Best Tariff for a Telecom Company

Determining the Prospective Tariff for a Telecom Company

Project Description

Tariff Descriptions

Project Steps

Data Description

Uavdet Small Gvba Dataset

Overview

Introduction

Object Classes

Bicycle

Description

Instructions

Bus

Description

Instructions

Car

Description

Instructions

Human

Description

Instructions

Motorbike

Description

Instructions

Truck

Description

Instructions

Van

Description

Instructions

A new large-scale index (AcED) for assessing traffic noise disturbance on...

Vital Signs: Time in Congestion - Corridor Shapefile (Updated October 2018)

Bird Strikes in Aviation: Aircraft Collisions

Features:

Data from: S1 Dataset -

Road traffic fatalities per one million inhabitants in the United States...

Crash data from Queensland roads

Road safety statistics: data tables

Latest data and table index

All collision, casualty and vehicle tables

Historic trends (RAS01)

Road user type (RAS02)

Road type (RAS03)

Air passenger traffic at Canadian airports, annual

Number of road accidents per one million inhabitants in the United States...

Number of households with internet access in Indonesia 2014-2029

Network Traffic Dataset

Use this Dataset for analysis the network traffic and designing the applications