https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Web Bench: A real-world benchmark for Browser Agents
WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025
Dataset Composition
Category Description Example Count (% of dataset)
READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset consists of three different data sources:
The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome.
The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months.
The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.
The CSV with extracted features has the following data fields:
- Label (1 - Doh, 0 - regular HTTPS)
- Data source
- Duration
- Minimal Inter-Packet Delay
- Maximal Inter-Packet Delay
- Average Inter-Packet Delay
- A variance of Incoming Packet Sizes
- A variance of Outgoing Packet Sizes
- A ratio of the number of Incoming and outgoing bytes
- A ration of the number of Incoming and outgoing packets
- Average of Incoming Packet sizes
- Average of Outgoing Packet sizes
- The median value of Incoming Packet sizes
- The median value of outgoing Packet sizes
- The ratio of bursts and pauses
- Number of bursts
- Number of pauses
- Autocorrelation
- Transmission symmetry in the 1st third of connection
- Transmission symmetry in the 2nd third of connection
- Transmission symmetry in the last third of connection
The observed network traffic does not contain privacy-sensitive information.
The zip file structure is:
|-- data
| |-- extracted-features...extracted features used in ML for DoH recognition
| | |-- chrome
| | |-- cloudflared
| | `-- firefox
| |-- flows...............................................exported flow data
| | |-- chrome
| | |-- cloudflared
| | `-- firefox
| `-- pcaps....................................................raw PCAP data
| |-- chrome
| |-- cloudflared
| `-- firefox
|-- LICENSE
`-- README.md
When using this dataset, please cite the original work as follows:
@inproceedings{vekshin2020,
author = {Vekshin, Dmitrii and Hynek, Karel and Cejka, Tomas},
title = {DoH Insight: Detecting DNS over HTTPS by Machine Learning},
year = {2020},
isbn = {9781450388337},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3407023.3409192},
doi = {10.1145/3407023.3409192},
booktitle = {Proceedings of the 15th International Conference on Availability, Reliability and Security},
articleno = {87},
numpages = {8},
keywords = {classification, DoH, DNS over HTTPS, machine learning, detection, datasets},
location = {Virtual Event, Ireland},
series = {ARES '20}
}
VITAL SIGNS INDICATOR Time Spent in Congestion (T7)
FULL MEASURE NAME Time Spent in Congestion
LAST UPDATED October 2018
DATA SOURCE MTC/Iteris Congestion Analysis No link available
CA Department of Finance Forms E-8 and E-5 http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/ http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/
CA Employment Division Department: Labor Market Information http://www.labormarketinfo.edd.ca.gov/
CONTACT INFORMATION vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator) Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.
This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.
To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.
MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].
Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy traffic volumes (consistent with longstanding regional methodologies used to generate the “top 10” list of congested segments). Historical Bay Area data was estimated by MTC Operations staff using a combination of internal datasets to develop an approximate trend back to 1998.
To explore how 2017 congestion trends compare to real-time congestion on the region’s freeways, visit 511.org.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
You are an analyst at "Megaline," a federal mobile operator. The company offers two tariff plans to customers: "Smart" and "Ultra." To adjust the advertising budget, the commercial department wants to understand which tariff generates more revenue.
You need to conduct a preliminary analysis of the tariffs on a small sample of customers. You have data on 500 users of "Megaline": who they are, where they are from, which tariff they use, how many calls and messages they sent in 2018. You need to analyze customer behavior and conclude which tariff is better.
"Smart" Tariff: - Monthly fee: 550 rubles - Included: 500 minutes of calls, 50 messages, and 15 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 3 rubles (Megaline always rounds up minutes and megabytes. If the user talked for just 1 second, it counts as a whole minute); 2. Message: 3 rubles; 3. 1 GB of internet traffic: 200 rubles.
"Ultra" Tariff: - Monthly fee: 1950 rubles - Included: 3000 minutes of calls, 1000 messages, and 30 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 1 ruble; 2. Message: 1 ruble; 3. 1 GB of internet traffic: 150 rubles.
Note: Megaline always rounds up seconds to minutes and megabytes to gigabytes. Each call is rounded up individually: even if it lasted just 1 second, it is counted as 1 minute. For web traffic, separate sessions are not counted. Instead, the total amount for the month is rounded up. If a subscriber uses 1025 megabytes in a month, they are charged for 2 gigabytes.
Step 1: Open the file with data and study the general information
File paths:
- /datasets/calls.csv
- /datasets/internet.csv
- /datasets/messages.csv
- /datasets/tariffs.csv
- /datasets/users.csv
Step 2: Prepare the data - Convert data to the required types; - Find and fix errors in the data, if any. Explain what errors you found and how you fixed them. You will find calls with zero duration in the data. This is not an error: missed calls are indicated by zeros, so they do not need to be deleted.
For each user, calculate: - Number of calls made and minutes spent per month; - Number of messages sent per month; - Amount of internet traffic used per month; - Monthly revenue from each user (subtract the free limit from the total number of calls, messages, and internet traffic; multiply the remainder by the value from the tariff plan; add the corresponding tariff plan's subscription fee).
Step 3: Analyze the data Describe the behavior of the operator's customers based on the sample. How many minutes of calls, how many messages, and how much internet traffic do users of each tariff need per month? Calculate the average, variance, and standard deviation. Create histograms. Describe the distributions.
Step 4: Test hypotheses - The average revenue of users of the "Ultra" and "Smart" tariffs is different; - The average revenue of users from Moscow differs from the revenue of users from other regions. Moscow is written as 'Москва'. You can put it in your value, when check the hypothesis
Set the threshold alpha value yourself.
Explain: - How you formulated the null and alternative hypotheses; - Which criterion you used to test the hypotheses and why.
Step 5: Write a general conclusion
Formatting: Perform the task in Jupyter Notebook. Fill the program code in the cells of type code
, and the textual explanations in the cells of type markdown
. Apply formatting and headers.
Table users
(user information):
- user_id
: unique user identifier
- first_name
: user's first name
- last_name
: user's last name
- age
: user's age (years)
- reg_date
: date of tariff connection (day, month, year)
- churn_date
: date of tariff discontinuation (if the value is missing, the tariff was still active at the time of data extraction)
- city
: user's city of residence
- tariff
: name of the tariff plan
Table calls
(call information):
- id
: unique call number
- call_date
: call date
- duration
: call duration in minutes
- user_id
: identifier of the user who made the call
Table messages
(message information):
- id
: unique message number
- message_date
: message date
- user_id
: identifier of the user who sent the message
Table internet
(internet session information):
- id
: unique session number
- mb_used
: amount of internet traffic used during the session (in megabytes)
- session_date
: internet session date
- user_id
: user identifier
Table tariffs
(tariff information):
- tariff_name
: tariff name
- rub_monthly_fee
: monthly subscription fee in rubles
- minutes_included
: number of call minutes included per month
- `messages_included...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset aims to annotate various types of vehicles and pedestrians in urban environments using aerial images. The goal is to create a comprehensive object detection dataset for applications such as traffic analysis and city planning. The dataset includes the following classes: bicycle, bus, car, human, motorbike, truck, and van.
A bicycle is a two-wheeled, human-powered vehicle. From an aerial view, it appears narrow with two wheels in line. It can often be seen alongside pedestrians or in bike lanes.
Buses are large public transport vehicles with a box-like structure. They are larger than cars and have a distinct length and width, noticeable from above.
Cars are smaller than buses and have a compact rectangular shape with visible wheels and a roof from an aerial perspective.
Humans appear as small, elongated shapes from an aerial view and are often seen on sidewalks or pedestrian crossings.
Motorbikes are narrower than cars and have a distinct two-wheel alignment. They might appear alongside or near cars and can sometimes be accompanied by a rider.
Trucks are large, elongated vehicles often used for transport or delivery. They are similar in shape to buses but generally have distinct cargo sections.
Vans are mid-sized, larger than cars but smaller than trucks and buses. They have a box-like structure distinct enough to notice from above.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Anthropogenic noise is a growing ubiquitous and pervasive pollutant as well as a recognised stressor that spreads throughout natural ecosystems. However, there is still an urgent need for the assessment of noise impact on natural ecosystems. This article presents a multidisciplinary study which made it possible to isolate noise due to road traffic to evaluate it as a major driver of detrimental effects on wildlife populations. A new indicator has been defined: AcED (the acoustic escape distance) and faecal cortisol metabolites (FCM) were extracted from roe deer faecal samples as a validated indicator of physiological stress in animals moving around in two low-traffic roads that cross a National Park in Spain. Two key findings turned out to be relevant in this study: (i) road identity (i.e. road type defined by traffic volume and average speed) and AcED were the variables that best explained the FCM values observed in roe deer, and (ii) FCM concentration was positively related to increasing traffic volume (road type) and AcED values. Our results suggest that FCM analysis and noise mapping have shown themselves to be useful tools in multidisciplinary approaches and environmental monitoring. Furthermore, our findings aroused the suspicion that low-traffic roads (< 1000 vehicles per day) could be capable of causing higher habitat degradation than has been deemed until now. Palabras clave: Disturbance, Noise, Wild boar
VITAL SIGNS INDICATOR
Time Spent in Congestion (T7)
FULL MEASURE NAME
Time Spent in Congestion
LAST UPDATED
October 2018
DATA SOURCE
MTC/Iteris Congestion Analysis
No link available
CA Department of Finance Forms E-8 and E-5
http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/
http://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-5/
CA Employment Division Department: Labor Market Information
http://www.labormarketinfo.edd.ca.gov/
CONTACT INFORMATION
vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator)
Time spent in congestion measures the hours drivers are in congestion on freeway facilities based on traffic data. In recent years, data for the Bay Area comes from INRIX, a company that collects real-time traffic information from a variety of sources including mobile phone data and other GPS locator devices. The data provides traffic speed on the region’s highways. Using historical INRIX data (and similar internal datasets for some of the earlier years), MTC calculates an annual time series for vehicle hours spent in congestion in the Bay Area. Time spent in congestion is defined as the average daily hours spent in congestion on Tuesdays, Wednesdays and Thursdays during peak traffic months on freeway facilities. This indicator focuses on weekdays given that traffic congestion is generally greater on these days; this indicator does not capture traffic congestion on local streets due to data unavailability.
This congestion indicator emphasizes recurring delay (as opposed to also including non-recurring delay), capturing the extent of delay caused by routine traffic volumes (rather than congestion caused by unusual circumstances). Recurring delay is identified by setting a threshold of consistent delay greater than 15 minutes on a specific freeway segment from vehicle speeds less than 35 mph. This definition is consistent with longstanding practices by MTC, Caltrans and the U.S. Department of Transportation as speeds less than 35 mph result in significantly less efficient traffic operations. 35 mph is the threshold at which vehicle throughput is greatest; speeds that are either greater than or less than 35 mph result in reduced vehicle throughput. This methodology focuses on the extra travel time experienced based on a differential between the congested speed and 35 mph, rather than the posted speed limit.
To provide a mathematical example of how the indicator is calculated on a segment basis, when it comes to time spent in congestion, 1,000 vehicles traveling on a congested segment for a 1/4 hour (15 minutes) each, [1,000 vehicles x ¼ hour congestion per vehicle= 250 hours congestion], is equivalent to 100 vehicles traveling on a congested segment for 2.5 hours each, [100 vehicles x 2.5 hour congestion per vehicle = 250 hours congestion]. In this way, the measure captures the impacts of both slow speeds and heavy traffic volumes.
MTC calculates two measures of delay – congested delay, or delay that occurs when speeds are below 35 miles per hour, and total delay, or delay that occurs when speeds are below the posted speed limit. To illustrate, if 1,000 vehicles are traveling at 30 miles per hour on a one mile long segment, this would represent 4.76 vehicle hours of congested delay [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 35 miles per hour) = 33.33 vehicle hours – 28.57 vehicle hours = 4.76 vehicle hours]. Considering that the posted speed limit on the segment is 60 miles per hour, total delay would be calculated as 16.67 vehicle hours [(1,000 vehicles x 1 mile / 30 miles per hour) - (1,000 vehicles x 1 mile / 60 miles per hour) = 33.33 vehicle hours – 16.67 vehicle hours = 16.67 vehicle hours].
Data sources listed above were used to calculate per-capita and per-worker statistics. Top congested corridors are ranked by total vehicle hours of delay, meaning that the highlighted corridors reflect a combination of slow speeds and heavy t
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Transport and communication are vital domains within the field of analytics, particularly in addressing safety and environmental concerns linked to the rapid growth of urban areas and increasing air traffic. Among the many risks aviation faces, bird strikes—collisions between aircraft and birds or other wildlife—pose a significant threat. These strikes can cause serious damage to aircraft, particularly jet engines, and have been responsible for some fatal accidents. Bird strikes are most likely to occur during critical flight phases such as take-off, climb, approach, and landing, when aircraft are at lower altitudes and bird activity is higher.
The dataset provided by the FAA, covering incidents from 2000 to 2011, offers a comprehensive overview of bird strikes in the U.S. It includes detailed visualizations and analyses across several key areas:
This dataset offers valuable insights into bird strike patterns, focusing on factors such as aircraft type, location, flight phase, and the specific species involved. By analyzing these variables, it helps identify risk factors and trends, supporting the development of strategies to reduce the frequency and impact of bird strikes, ultimately enhancing aviation safety and risk mitigation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundGlobally, road traffic accidents (RTAs) cause over 1.35 million deaths each year, with an additional 50 million people suffering disabilities. Ethiopia has the highest number of road traffic accidents, with over 14,000 people killed and over 45,000 injured annually. This study aimed to assess survival status and predictors of mortality among road traffic accident adult patients admitted to intensive care units of Referral Hospitals in Tigray, 2024.MethodsAn institution-based retrospective follow-up study design was conducted from January 8, 2019, to December 11, 2023, on 333 patient charts. A bivariable Cox-regression analysis was performed to estimate crude hazard ratios (CHR). Subsequently, a multivariable Cox regression analysis was performed to estimate the Adjusted Hazard Ratios (AHR). Finally, AHR with p-value less than 0.05 was used to measure the association between dependent and independent variables.ResultThe incidence of mortality for road traffic accident victims, was 21 per 1000 person-days observation with (95% CI: 16, 27.6) and the median survival time was 14 days. The predictors of mortality in this study were the value of oxygen saturation on admission ≤ 89% (AHR = 4.9; 95%CI: 1.4–17.2), Intracranial hemorrhage (AHR = 3.3; 95% CI: 1.02–11), chest injury (AHR = 3.2; 95%CI: 1.38–7.59), victims with age catgories of 31–45 years (AHR = 0.3; 95% CI: 0.1–0.88) and 46–60 years (AHR = 0.22; 95% CI: 0.06–0.89).ConclusionA concerningly high mortality rate from car accidents were found in Referral Hospitals of Tigray. To improve the survival rates, healthcare providers should focus on victims with very low oxygen levels, head injuries, chest injuries, and older victims.
The number of road traffic fatalities per one million inhabitants in the United States was forecast to continuously increase between 2024 and 2029 by in total 18.5 deaths (+13.81 percent). After the tenth consecutive increasing year, the number is estimated to reach 152.46 deaths and therefore a new peak in 2029. Depicted here are the estimated number of deaths which occured in relation to road traffic. They are set in relation to the population size and depicted as deaths per 100,000 inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road traffic fatalities per one million inhabitants in countries like Mexico and Canada.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview:
Information on location and characteristics of crashes in Queensland for all reported Road Traffic Crashes occurred from 1 January 2001 to 30 June 2024.
Fatal, Hospitalisation, Medical treatment and Minor injury:
This dataset contains information on crashes reported to the police which resulted from the movement of at least 1 road vehicle on a road or road related area. Crashes listed in this resource have occurred on a public road and meet one of the following criteria:
Property damage:
Please note:
These tables present high-level breakdowns and time series. A list of all tables, including those discontinued, is available in the table index. More detailed data is available in our data tools, or by downloading the open dataset.
The tables below are the latest final annual statistics for 2023. The latest data currently available are provisional figures for 2024. These are available from the latest provisional statistics.
A list of all reported road collisions and casualties data tables and variables in our data download tool is available in the https://assets.publishing.service.gov.uk/media/683709928ade4d13a63236df/reported-road-casualties-gb-index-of-tables.ods">Tables index (ODS, 30.1 KB).
https://assets.publishing.service.gov.uk/media/66f44e29c71e42688b65ec43/ras-all-tables-excel.zip">Reported road collisions and casualties data tables (zip file) (ZIP, 16.6 MB)
RAS0101: https://assets.publishing.service.gov.uk/media/66f44bd130536cb927482733/ras0101.ods">Collisions, casualties and vehicles involved by road user type since 1926 (ODS, 52.1 KB)
RAS0102: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ec/ras0102.ods">Casualties and casualty rates, by road user type and age group, since 1979 (ODS, 142 KB)
RAS0201: https://assets.publishing.service.gov.uk/media/66f44bd1a31f45a9c765ec1f/ras0201.ods">Numbers and rates (ODS, 60.7 KB)
RAS0202: https://assets.publishing.service.gov.uk/media/66f44bd1e84ae1fd8592e8f0/ras0202.ods">Sex and age group (ODS, 167 KB)
RAS0203: https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods">Rates by mode, including air, water and rail modes (ODS, 24.2 KB)
RAS0301: https://assets.publishing.service.gov.uk/media/66f44bd1c71e42688b65ec3e/ras0301.ods">Speed limit, built-up and non-built-up roads (ODS, 49.3 KB)
RAS0302: https://assets.publishing.service.gov.uk/media/66f44bd1080bdf716392e8ee/ras0302.ods">Urban and rural roa
Passengers enplaned and deplaned at Canadian airports, annual.
The number of road accidents per one million inhabitants in the United States was forecast to continuously decrease between 2024 and 2029 by in total 2,490.4 accidents (-14.99 percent). After the eighth consecutive decreasing year, the number is estimated to reach 14,118.78 accidents and therefore a new minimum in 2029. Depicted here are the estimated number of accidents which occured in relation to road traffic. They are set in relation to the population size and depicted as accidents per one million inhabitants.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of road accidents per one million inhabitants in countries like Mexico and Canada.
The number of households with internet access in Indonesia was forecast to continuously increase between 2024 and 2029 by in total 3.8 million households (+6.49 percent). After the fifteenth consecutive increasing year, the number of households is estimated to reach 62.36 million households and therefore a new peak in 2029. Notably, the number of households with internet access of was continuously increasing over the past years.Depicted is the number of housholds with internet access in the country or region at hand.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of households with internet access in countries like Singapore and Vietnam.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.