16 datasets found
  1. Network Traffic Dataset

    • kaggle.com
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravikumar Gattu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

    Content :

    This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

    The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

    Dataset Columns:

    No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

    Acknowledgements :

    I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

    Ravikumar Gattu , Susmitha Choppadandi

    Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

    **Dataset License: ** CC0: Public Domain

    Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    ML techniques benefits from this Dataset :

    This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

    1. Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

    2. Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

    3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

  2. WebBench

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Halluminate (2025). WebBench [Dataset]. https://huggingface.co/datasets/Halluminate/WebBench
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset provided by
    Halluminate, Inc.
    Authors
    Halluminate
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Web Bench: A real-world benchmark for Browser Agents

    WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025

      Dataset Composition
    

    Category Description Example Count (% of dataset)

    READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.

  3. Dataset used for detecting DNS over HTTPS by Machine Learning.

    • zenodo.org
    zip
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin (2020). Dataset used for detecting DNS over HTTPS by Machine Learning. [Dataset]. http://doi.org/10.5281/zenodo.3906526
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 28, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dmitrii Vekshin; Karel Hynek; Karel Hynek; Tomas Cejka; Tomas Cejka; Dmitrii Vekshin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset consists of three different data sources:

    1. DoH enabled Firefox
    2. DoH enabled Google Chrome
    3. Cloudflared DoH proxy

    The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome.

    The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months.

    The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.

    The CSV with extracted features has the following data fields:

    - Label (1 - Doh, 0 - regular HTTPS)
    - Data source
    - Duration
    - Minimal Inter-Packet Delay
    - Maximal Inter-Packet Delay
    - Average Inter-Packet Delay
    - A variance of Incoming Packet Sizes
    - A variance of Outgoing Packet Sizes
    - A ratio of the number of Incoming and outgoing bytes
    - A ration of the number of Incoming and outgoing packets
    - Average of Incoming Packet sizes
    - Average of Outgoing Packet sizes
    - The median value of Incoming Packet sizes
    - The median value of outgoing Packet sizes
    - The ratio of bursts and pauses
    - Number of bursts
    - Number of pauses
    - Autocorrelation
    - Transmission symmetry in the 1st third of connection
    - Transmission symmetry in the 2nd third of connection
    - Transmission symmetry in the last third of connection

    The observed network traffic does not contain privacy-sensitive information.

    The zip file structure is:

    |-- data
    |  |-- extracted-features...extracted features used in ML for DoH recognition
    |  |  |-- chrome
    |  |  |-- cloudflared
    |  |  `-- firefox
    |  |-- flows...............................................exported flow data
    |  |  |-- chrome
    |  |  |-- cloudflared
    |  |  `-- firefox
    |  `-- pcaps....................................................raw PCAP data
    |    |-- chrome
    |    |-- cloudflared
    |    `-- firefox
    |-- LICENSE
    `-- README.md


    When using this dataset, please cite the original work as follows:

    @inproceedings{vekshin2020,
    author = {Vekshin, Dmitrii and Hynek, Karel and Cejka, Tomas},
    title = {DoH Insight: Detecting DNS over HTTPS by Machine Learning},
    year = {2020},
    isbn = {9781450388337},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3407023.3409192},
    doi = {10.1145/3407023.3409192},
    booktitle = {Proceedings of the 15th International Conference on Availability, Reliability and Security},
    articleno = {87},
    numpages = {8},
    keywords = {classification, DoH, DNS over HTTPS, machine learning, detection, datasets},
    location = {Virtual Event, Ireland},
    series = {ARES '20}
    }
    

  4. T

    Vital Signs: Time in Congestion - Bay Area (updated October 2018)

    • data.bayareametro.gov
    csv, xlsx, xml
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Vital Signs: Time in Congestion - Bay Area (updated October 2018) [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Time-in-Congestion-Bay-Area-updated-Oc/ja9p-vpfm
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Oct 16, 2018
    Area covered
    San Francisco Bay Area
    Description

    VITAL SIGNS INDICATOR Time Spent in Congestion (T7)

    FULL MEASURE NAME Time Spent in Congestion

    LAST UPDATED October 2018

    DATA SOURCE MTC/Iteris Congestion Analysis No link available

    CA Department of Finance Forms E-8 and E-5 http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/ http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/

    CA Employment Division Department: Labor Market Information http://www.labormarketinfo.edd.ca.gov/

    CONTACT INFORMATION vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator) Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.

    This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.

    To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.

    MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].

    Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy traffic volumes (consistent with longstanding regional methodologies used to generate the “top 10” list of congested segments). Historical Bay Area data was estimated by MTC Operations staff using a combination of internal datasets to develop an approximate trend back to 1998.

    To explore how 2017 congestion trends compare to real-time congestion on the region’s freeways, visit 511.org.

  5. Define Best Tariff for a Telecom Company

    • kaggle.com
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Nikiforov (2024). Define Best Tariff for a Telecom Company [Dataset]. https://www.kaggle.com/datasets/romanniki/prospective-tariff-for-a-telecom-company/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Roman Nikiforov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Determining the Prospective Tariff for a Telecom Company

    Project Description

    You are an analyst at "Megaline," a federal mobile operator. The company offers two tariff plans to customers: "Smart" and "Ultra." To adjust the advertising budget, the commercial department wants to understand which tariff generates more revenue.

    You need to conduct a preliminary analysis of the tariffs on a small sample of customers. You have data on 500 users of "Megaline": who they are, where they are from, which tariff they use, how many calls and messages they sent in 2018. You need to analyze customer behavior and conclude which tariff is better.

    Tariff Descriptions

    "Smart" Tariff: - Monthly fee: 550 rubles - Included: 500 minutes of calls, 50 messages, and 15 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 3 rubles (Megaline always rounds up minutes and megabytes. If the user talked for just 1 second, it counts as a whole minute); 2. Message: 3 rubles; 3. 1 GB of internet traffic: 200 rubles.

    "Ultra" Tariff: - Monthly fee: 1950 rubles - Included: 3000 minutes of calls, 1000 messages, and 30 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 1 ruble; 2. Message: 1 ruble; 3. 1 GB of internet traffic: 150 rubles.

    Note: Megaline always rounds up seconds to minutes and megabytes to gigabytes. Each call is rounded up individually: even if it lasted just 1 second, it is counted as 1 minute. For web traffic, separate sessions are not counted. Instead, the total amount for the month is rounded up. If a subscriber uses 1025 megabytes in a month, they are charged for 2 gigabytes.

    Project Steps

    Step 1: Open the file with data and study the general information File paths: - /datasets/calls.csv - /datasets/internet.csv - /datasets/messages.csv - /datasets/tariffs.csv - /datasets/users.csv

    Step 2: Prepare the data - Convert data to the required types; - Find and fix errors in the data, if any. Explain what errors you found and how you fixed them. You will find calls with zero duration in the data. This is not an error: missed calls are indicated by zeros, so they do not need to be deleted.

    For each user, calculate: - Number of calls made and minutes spent per month; - Number of messages sent per month; - Amount of internet traffic used per month; - Monthly revenue from each user (subtract the free limit from the total number of calls, messages, and internet traffic; multiply the remainder by the value from the tariff plan; add the corresponding tariff plan's subscription fee).

    Step 3: Analyze the data Describe the behavior of the operator's customers based on the sample. How many minutes of calls, how many messages, and how much internet traffic do users of each tariff need per month? Calculate the average, variance, and standard deviation. Create histograms. Describe the distributions.

    Step 4: Test hypotheses - The average revenue of users of the "Ultra" and "Smart" tariffs is different; - The average revenue of users from Moscow differs from the revenue of users from other regions. Moscow is written as 'Москва'. You can put it in your value, when check the hypothesis

    Set the threshold alpha value yourself.

    Explain: - How you formulated the null and alternative hypotheses; - Which criterion you used to test the hypotheses and why.

    Step 5: Write a general conclusion

    Formatting: Perform the task in Jupyter Notebook. Fill the program code in the cells of type code, and the textual explanations in the cells of type markdown. Apply formatting and headers.

    Data Description

    Table users (user information): - user_id: unique user identifier - first_name: user's first name - last_name: user's last name - age: user's age (years) - reg_date: date of tariff connection (day, month, year) - churn_date: date of tariff discontinuation (if the value is missing, the tariff was still active at the time of data extraction) - city: user's city of residence - tariff: name of the tariff plan

    Table calls (call information): - id: unique call number - call_date: call date - duration: call duration in minutes - user_id: identifier of the user who made the call

    Table messages (message information): - id: unique message number - message_date: message date - user_id: identifier of the user who sent the message

    Table internet (internet session information): - id: unique session number - mb_used: amount of internet traffic used during the session (in megabytes) - session_date: internet session date - user_id: user identifier

    Table tariffs (tariff information): - tariff_name: tariff name - rub_monthly_fee: monthly subscription fee in rubles - minutes_included: number of call minutes included per month - `messages_included...

  6. R

    Uavdet Small Gvba Dataset

    • universe.roboflow.com
    zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow100VL Full (2025). Uavdet Small Gvba Dataset [Dataset]. https://universe.roboflow.com/roboflow100vl-full/uavdet-small-gvba/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Roboflow100VL Full
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Uavdet Small Gvba Gvba Bounding Boxes
    Description

    Overview

    Introduction

    This dataset aims to annotate various types of vehicles and pedestrians in urban environments using aerial images. The goal is to create a comprehensive object detection dataset for applications such as traffic analysis and city planning. The dataset includes the following classes: bicycle, bus, car, human, motorbike, truck, and van.

    Object Classes

    Bicycle

    Description

    A bicycle is a two-wheeled, human-powered vehicle. From an aerial view, it appears narrow with two wheels in line. It can often be seen alongside pedestrians or in bike lanes.

    Instructions

    • Annotate the entire structure, including both wheels and the frame.
    • Do not include the rider as part of the bicycle annotation; the rider should be annotated as a separate human if visible.
    • Ensure clear visibility of both wheels and the frame; do not annotate if mostly obscured by other objects.

    Bus

    Description

    Buses are large public transport vehicles with a box-like structure. They are larger than cars and have a distinct length and width, noticeable from above.

    Instructions

    • Annotate the full rectangular structure, including any visible wheels.
    • Exclude overlapping vehicles or structures on top of the bus.
    • Annotate only if more than 50% of the bus is visible.

    Car

    Description

    Cars are smaller than buses and have a compact rectangular shape with visible wheels and a roof from an aerial perspective.

    Instructions

    • Outline the car’s body, including visible wheels.
    • Do not include shadows or reflections in the annotation.
    • Ensure the car is not overly occluded or indistinguishable from other vehicles.

    Human

    Description

    Humans appear as small, elongated shapes from an aerial view and are often seen on sidewalks or pedestrian crossings.

    Instructions

    • Annotate individual human figures only when clearly visible and distinct.
    • Exclude groups where individuals cannot be distinguished.
    • Avoid annotating if the figure is too small (less than 10 pixels).

    Motorbike

    Description

    Motorbikes are narrower than cars and have a distinct two-wheel alignment. They might appear alongside or near cars and can sometimes be accompanied by a rider.

    Instructions

    • Encompass the entire structure, but do not include riders in the motorbike annotation.
    • Ensure both wheels are visible; avoid annotating if mostly hidden.
    • Differentiate from bicycles by their typically larger size and engine presence.

    Truck

    Description

    Trucks are large, elongated vehicles often used for transport or delivery. They are similar in shape to buses but generally have distinct cargo sections.

    Instructions

    • Annotate the whole truck, including cab and cargo sections.
    • Ignore small trailers or attached equipment.
    • Annotate only when over half of the truck is visible.

    Van

    Description

    Vans are mid-sized, larger than cars but smaller than trucks and buses. They have a box-like structure distinct enough to notice from above.

    Instructions

    • Outline the van including visible wheels and roof.
    • Avoid overlapping annotations with nearby vehicles.
    • Ensure the van's shape is clear and not obscured by large objects.
  7. i

    A new large-scale index (AcED) for assessing traffic noise disturbance on...

    • pre.iepnb.es
    • iepnb.es
    • +1more
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). A new large-scale index (AcED) for assessing traffic noise disturbance on wildlife: stress response in a roe deer (Capreolus capreolus) population. - Dataset - CKAN [Dataset]. https://pre.iepnb.es/catalogo/dataset/a-new-large-scale-index-aced-for-assessing-traffic-noise-disturbance-on-wildlife-stress-respons1
    Explore at:
    Dataset updated
    May 23, 2025
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Anthropogenic noise is a growing ubiquitous and pervasive pollutant as well as a recognised stressor that spreads throughout natural ecosystems. However, there is still an urgent need for the assessment of noise impact on natural ecosystems. This article presents a multidisciplinary study which made it possible to isolate noise due to road traffic to evaluate it as a major driver of detrimental effects on wildlife populations. A new indicator has been defined: AcED (the acoustic escape distance) and faecal cortisol metabolites (FCM) were extracted from roe deer faecal samples as a validated indicator of physiological stress in animals moving around in two low-traffic roads that cross a National Park in Spain. Two key findings turned out to be relevant in this study: (i) road identity (i.e. road type defined by traffic volume and average speed) and AcED were the variables that best explained the FCM values observed in roe deer, and (ii) FCM concentration was positively related to increasing traffic volume (road type) and AcED values. Our results suggest that FCM analysis and noise mapping have shown themselves to be useful tools in multidisciplinary approaches and environmental monitoring. Furthermore, our findings aroused the suspicion that low-traffic roads (< 1000 vehicles per day) could be capable of causing higher habitat degradation than has been deemed until now. Palabras clave: Disturbance, Noise, Wild boar

  8. S

    Vital Signs: Time in Congestion - Corridor Shapefile (Updated October 2018)

    • splitgraph.com
    • data.bayareametro.gov
    Updated Oct 24, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bayareametro-gov (2018). Vital Signs: Time in Congestion - Corridor Shapefile (Updated October 2018) [Dataset]. https://www.splitgraph.com/bayareametro-gov/vital-signs-time-in-congestion-corridor-shapefile-j4ig-7vv6/
    Explore at:
    application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
    Dataset updated
    Oct 24, 2018
    Authors
    bayareametro-gov
    Description

    VITAL SIGNS INDICATOR

    Time Spent in Congestion (T7)

    FULL MEASURE NAME

    Time Spent in Congestion

    LAST UPDATED

    October 2018

    DATA SOURCE

    MTC/Iteris Congestion Analysis

    No link available

    CA Department of Finance Forms E-8 and E-5

    http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/

    http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/

    CA Employment Division Department: Labor Market Information

    http://www.labormarketinfo.edd.ca.gov/

    CONTACT INFORMATION

    vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator)

    Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.

    This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.

    To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.

    MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].

    Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy t

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  9. Bird Strikes in Aviation: Aircraft Collisions

    • kaggle.com
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tapendu Karmakar (2024). Bird Strikes in Aviation: Aircraft Collisions [Dataset]. https://www.kaggle.com/datasets/iamtapendu/bird-strike-by-aircafts-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tapendu Karmakar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Transport and communication are vital domains within the field of analytics, particularly in addressing safety and environmental concerns linked to the rapid growth of urban areas and increasing air traffic. Among the many risks aviation faces, bird strikes—collisions between aircraft and birds or other wildlife—pose a significant threat. These strikes can cause serious damage to aircraft, particularly jet engines, and have been responsible for some fatal accidents. Bird strikes are most likely to occur during critical flight phases such as take-off, climb, approach, and landing, when aircraft are at lower altitudes and bird activity is higher.

    The dataset provided by the FAA, covering incidents from 2000 to 2011, offers a comprehensive overview of bird strikes in the U.S. It includes detailed visualizations and analyses across several key areas:

    • Trends Over Time: Yearly distribution of bird strike incidents.
    • Airline Impact: Analysis of the top 10 U.S. airlines affected by bird strikes.
    • Airport Incidents: Identification of the 50 U.S. airports with the highest frequency of bird strike incidents.
    • Economic Impact: Yearly costs incurred by airlines and the aviation industry due to bird strikes.
    • Timing and Altitude: When and at what altitude most bird strikes occur.
    • Flight Phase: The phase of flight during which strikes are most likely to happen.
    • Impact Analysis: How bird strikes affect flight operations, including aircraft damage.
    • Pilot Awareness: Correlation between pilot knowledge of potential bird strike risks and the severity of the incidents.

    This dataset offers valuable insights into bird strike patterns, focusing on factors such as aircraft type, location, flight phase, and the specific species involved. By analyzing these variables, it helps identify risk factors and trends, supporting the development of strategies to reduce the frequency and impact of bird strikes, ultimately enhancing aviation safety and risk mitigation.

    Features:

    • AircraftType: The type of aircraft involved in the bird strike incident (e.g., "Airplane").
    • AirportName: The name of the airport where the bird strike occurred (e.g., "LAGUARDIA NY", "DALLAS/FORT WORTH INTL ARPT").
    • AltitudeBin: The altitude range (in feet) at which the bird strike occurred, divided into bins (e.g., "(1000, 2000]", "(30, 50]").
    • MakeModel: The specific make and model of the aircraft involved (e.g., "B-737-400", "MD-80", "A-300").
    • NumberStruck: The number of birds that were struck during the incident (e.g., "Over 100", "1", "26").
    • NumberStruckActual: The actual number of birds that were struck during the incident (e.g., 859, 424, 261).
    • Effect: The effect of the bird strike on the aircraft, indicating whether it caused any damage or not (e.g., "Engine Shut Down", "No damage", "Caused damage").
    • FlightDate: The date of the bird strike incident (e.g., "11/23/00 0:00").
    • Damage: A description of the damage caused by the bird strike (e.g., "Caused damage", "No damage").
    • Engines: The number of engines on the aircraft involved in the bird strike (e.g., 2 engines).
    • Operator: The airline or operator of the aircraft involved in the bird strike (e.g., "US AIRWAYS", "AMERICAN AIRLINES", "ALASKA AIRLINES").
    • OriginState: The U.S. state where the aircraft originated (e.g., "New York", "Texas", "Washington").
    • FlightPhase: The phase of flight during which the bird strike occurred (e.g., "Climb", "Landing Roll", "Approach", "Take-off run")
    • ConditionsPrecipitation: The weather condition related to precipitation at the time of the bird strike (e.g., "None", "Some Cloud").
    • RemainsCollected?: Indicates whether bird remains were collected after the strike (e.g., "True" or "False").
    • RemainsSentToSmithsonian: Indicates whether the bird remains were sent to the Smithsonian Institution for study (e.g., "True" or "False").
    • Remarks: Additional comments or notes related to the incident, including specific details like the number of birds involved, actions taken, or other observations (e.g., "FLYING UNDER A VERY LARGE FLOCK OF BIRDS", "BIRD REMAINS ON F/O WINDSCREEN").
    • WildlifeSize: The size of the bird or wildlife involved in the strike (e.g., "Small", "Medium").
    • ConditionsSky: The sky condition at the time of the bird strike (e.g., "No Cloud", "Some Cloud").
    • WildlifeSpecies: The species of the bird or wildlife involved in the strike (e.g., "European starling", "Rock pigeon", "Unknown bird - medium").
    • PilotWarned: Indicates whether the pilot was warned about the potential for a bird strike (e.g., "Y" for Yes, "N" for No).
    • Cost: The cost incurred as a result of the bird strike (e.g., financial cost to repair damage or related expenses, usually in monetary value like 30,736).
    • Altitude: The specific alt...
  10. f

    Data from: S1 Dataset -

    • plos.figshare.com
    xlsx
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Binyam Gebrehiwet Tesfay; Tensay Kahsay Welegebriel; Desta Hailu Aregawi; Mamush Gidey Abrha; Berhe Gebrehiwot Tewele; Fissha Brhane Mesele; Fiseha Abadi Gebreanenia; Kelali Goitom Weldu (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0308584.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Binyam Gebrehiwet Tesfay; Tensay Kahsay Welegebriel; Desta Hailu Aregawi; Mamush Gidey Abrha; Berhe Gebrehiwot Tewele; Fissha Brhane Mesele; Fiseha Abadi Gebreanenia; Kelali Goitom Weldu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundGlobally, road traffic accidents (RTAs) cause over 1.35 million deaths each year, with an additional 50 million people suffering disabilities. Ethiopia has the highest number of road traffic accidents, with over 14,000 people killed and over 45,000 injured annually. This study aimed to assess survival status and predictors of mortality among road traffic accident adult patients admitted to intensive care units of Referral Hospitals in Tigray, 2024.MethodsAn institution-based retrospective follow-up study design was conducted from January 8, 2019, to December 11, 2023, on 333 patient charts. A bivariable Cox-regression analysis was performed to estimate crude hazard ratios (CHR). Subsequently, a multivariable Cox regression analysis was performed to estimate the Adjusted Hazard Ratios (AHR). Finally, AHR with p-value less than 0.05 was used to measure the association between dependent and independent variables.ResultThe incidence of mortality for road traffic accident victims, was 21 per 1000 person-days observation with (95% CI: 16, 27.6) and the median survival time was 14 days. The predictors of mortality in this study were the value of oxygen saturation on admission ≤ 89% (AHR = 4.9; 95%CI: 1.4–17.2), Intracranial hemorrhage (AHR = 3.3; 95% CI: 1.02–11), chest injury (AHR = 3.2; 95%CI: 1.38–7.59), victims with age catgories of 31–45 years (AHR = 0.3; 95% CI: 0.1–0.88) and 46–60 years (AHR = 0.22; 95% CI: 0.06–0.89).ConclusionA concerningly high mortality rate from car accidents were found in Referral Hospitals of Tigray. To improve the survival rates, healthcare providers should focus on victims with very low oxygen levels, head injuries, chest injuries, and older victims.

  11. Road traffic fatalities per one million inhabitants in the United States...

    • statista.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2023). Road traffic fatalities per one million inhabitants in the United States 2014-2029 [Dataset]. https://www.statista.com/topics/3708/road-accidents-in-the-us/
    Explore at:
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of road traffic fatalities per one million inhabitants in the United States was forecast to continuously increase between 2024 and 2029 by in total 18.5 deaths (+13.81 percent). After the tenth consecutive increasing year, the number is estimated to reach 152.46 deaths and therefore a new peak in 2029. Depicted here are the estimated number of deaths which occured in relation to road traffic. They are set in relation to the population size and depicted as deaths per 100,000 inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road traffic fatalities per one million inhabitants in countries like Mexico and Canada.

  12. Crash data from Queensland roads

    • data.qld.gov.au
    • data.wu.ac.at
    csv
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Transport and Main Roads (2025). Crash data from Queensland roads [Dataset]. https://www.data.qld.gov.au/dataset/crash-data-from-queensland-roads
    Explore at:
    csv(3 MiB), csv(2 MiB), csv(1 MiB), csv(303 KiB), csv(196.5 MiB), csv(196.5 KiB)Available download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Department of Transport and Main Roadshttp://tmr.qld.gov.au/
    Authors
    Transport and Main Roads
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Queensland
    Description

    Overview:

    Information on location and characteristics of crashes in Queensland for all reported Road Traffic Crashes occurred from 1 January 2001 to 30 June 2024.

    Fatal, Hospitalisation, Medical treatment and Minor injury:

    This dataset contains information on crashes reported to the police which resulted from the movement of at least 1 road vehicle on a road or road related area. Crashes listed in this resource have occurred on a public road and meet one of the following criteria:

    • a person is killed or injured, or
    • at least 1 vehicle was towed away, or
    • the value of the property damage meets the appropriate criteria listed below.

    Property damage:

    1. $2500 or more damage to property other than vehicles (after 1 December 1999)
    2. $2500 or more damage to vehicle and/or other property (after 1 December 1991 and before 1 December 1999)
    3. value of property damage is greater than $1000 (before December 1991).

    Please note:

    • This data has been extracted from the Queensland Road Crash Database.
    • Information held in the Road Crash Database on events occurring within the last 12 months is considered preliminary as investigations into crashes can take up to 1 year to finalise.
    • Property damage only crashes ceased to be reported/recorded by Queensland Police Service after 31 December 2010.
    • These crash location coordinates reference the current Australian geodetic datum is GDA2020 (previously it was GDA94).
  13. Road safety statistics: data tables

    • gov.uk
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Road safety statistics: data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/reported-road-accidents-vehicles-and-casualties-tables-for-great-britain
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    These tables present high-level breakdowns and time series. A list of all tables, including those discontinued, is available in the table index. More detailed data is available in our data tools, or by downloading the open dataset.

    Latest data and table index

    The tables below are the latest final annual statistics for 2023. The latest data currently available are provisional figures for 2024. These are available from the latest provisional statistics.

    A list of all reported road collisions and casualties data tables and variables in our data download tool is available in the https://assets.publishing.service.gov.uk/media/683709928ade4d13a63236df/reported-road-casualties-gb-index-of-tables.ods">Tables index (ODS, 30.1 KB).

    All collision, casualty and vehicle tables

    https://assets.publishing.service.gov.uk/media/66f44e29c71e42688b65ec43/ras-all-tables-excel.zip">Reported road collisions and casualties data tables (zip file) (ZIP, 16.6 MB)

    Historic trends (RAS01)

    RAS0101: https://assets.publishing.service.gov.uk/media/66f44bd130536cb927482733/ras0101.ods">Collisions, casualties and vehicles involved by road user type since 1926 (ODS, 52.1 KB)

    RAS0102: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ec/ras0102.ods">Casualties and casualty rates, by road user type and age group, since 1979 (ODS, 142 KB)

    Road user type (RAS02)

    RAS0201: https://assets.publishing.service.gov.uk/media/66f44bd1a31f45a9c765ec1f/ras0201.ods">Numbers and rates (ODS, 60.7 KB)

    RAS0202: https://assets.publishing.service.gov.uk/media/66f44bd1e84ae1fd8592e8f0/ras0202.ods">Sex and age group (ODS, 167 KB)

    RAS0203: https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods">Rates by mode, including air, water and rail modes (ODS, 24.2 KB)

    Road type (RAS03)

    RAS0301: https://assets.publishing.service.gov.uk/media/66f44bd1c71e42688b65ec3e/ras0301.ods">Speed limit, built-up and non-built-up roads (ODS, 49.3 KB)

    RAS0302: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ee/ras0302.ods">Urban and rural roa

  14. Air passenger traffic at Canadian airports, annual

    • www150.statcan.gc.ca
    • open.canada.ca
    • +2more
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Air passenger traffic at Canadian airports, annual [Dataset]. http://doi.org/10.25318/2310025301-eng
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Passengers enplaned and deplaned at Canadian airports, annual.

  15. Number of road accidents per one million inhabitants in the United States...

    • statista.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2023). Number of road accidents per one million inhabitants in the United States 2014-2029 [Dataset]. https://www.statista.com/topics/3708/road-accidents-in-the-us/
    Explore at:
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of road accidents per one million inhabitants in the United States was forecast to continuously decrease between 2024 and 2029 by in total 2,490.4 accidents (-14.99 percent). After the eighth consecutive decreasing year, the number is estimated to reach 14,118.78 accidents and therefore a new minimum in 2029. Depicted here are the estimated number of accidents which occured in relation to road traffic. They are set in relation to the population size and depicted as accidents per one million inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road accidents per one million inhabitants in countries like Mexico and Canada.

  16. Number of households with internet access in Indonesia 2014-2029

    • statista.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of households with internet access in Indonesia 2014-2029 [Dataset]. https://www.statista.com/topics/2431/internet-usage-in-indonesia/
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    Indonesia
    Description

    The number of households with internet access in Indonesia was forecast to continuously increase between 2024 and 2029 by in total 3.8 million households (+6.49 percent). After the fifteenth consecutive increasing year, the number of households is estimated to reach 62.36 million households and therefore a new peak in 2029. Notably, the number of households with internet access of was continuously increasing over the past years.Depicted is the number of housholds with internet access in the country or region at hand.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of households with internet access in countries like Singapore and Vietnam.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Organization logo

Network Traffic Dataset

Use this Dataset for analysis the network traffic and designing the applications

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

  1. Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

  2. Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

Search
Clear search
Close search
Google apps
Main menu