https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are publishing a dataset we created for the HTTPS traffic classification.
Since the data were captured mainly in the real backbone network, we omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).
During our research, we divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.
We have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. We also used several popular websites that primarily focus on the audience in our country. The identified traffic classes and their representatives are provided below:
Live Video Stream Twitch, Czech TV, YouTube Live
Video Player DailyMotion, Stream.cz, Vimeo, YouTube
Music Player AppleMusic, Spotify, SoundCloud
File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive
Website and Other Traffic Websites from Alexa Top 1M list
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Linear network representing the estimated traffic flows for roads and highways managed by the Ministry of Transport and Sustainable Mobility (MTMD). These flows are obtained using a statistical estimation method applied to data from more than 4,500 collection sites spread over the main roads of Quebec. It includes DJMA (annual average daily flow), DJME (summer average daily flow), DJME (summer average daily flow (June, July, August, September) and DJMH (average daily winter flow (December, January, February, March) as well as other traffic data. It is important to note that these values are calculated for total traffic directions. Interactive map: Some files are accessible by querying a section of traffic à la carte with a click (the file links are displayed in the descriptive table that is displayed when clicking): • Historical aggregated data (PDF) • Annual reports for permanent sites (PDF and Excel) • Hourly data (hourly average per weekday per month) (Excel) • Annual reports for permanent sites (PDF and Excel) • Hourly data (hourly average per weekday per month) (Excel)**This third party metadata element was translated using an automated translation tool (Amazon Translate).**
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Construction site coordination in Hamburg The preservation of the infrastructure is of fundamental importance for the development of Hamburg. Therefore, construction sites in the street space are part of the normal picture - to the chagrin of local residents and road users. In many cases, however, it is not work on the road itself that leads to disabilities, but the many supply and disposal lines in the road body or the construction projects of private individuals. Approximately 25,000 jobs per year on Hamburg's road network, of which over 3,700 are on major roads, therefore require careful coordination to minimise obstacles to traffic flow. This is the task of the Traffic Optimization Department at the Department of Transport and Mobility Transition. Here, the incoming information of all road construction departments, pipeline companies and private builders is collected and evaluated. The information for the most important construction sites is published with a 7-day preview on the Internet at www.hamburg.de/baustellen. When coordinating construction sites, the aim is to prevent simultaneous construction sites, e.g. on important parallel roads, so that traffic has trouble-free alternative routes. However, no matter how good coordination can absolutely prevent congestion. The Hamburg road network is partly busy and partly overloaded in the morning and evening rush hour. Therefore, we recommend every road user to inform himself about the current traffic situation before starting the journey and only then to choose a suitable means of transport including route.
If you have any questions about construction sites in Hamburg, please contact the construction site hotline on 040 428 28 2020 or by post to
Free and Hanseatic City of Hamburg Transport and Mobility Transition Authority Old stone path 4 20459 Hamburg
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
15 smart sensors were installed on Mill Road and surrounding streets to record numbers of pedestrians, bicycles, cars and other vehicles. The data being collated and analysed by the Smart Cambridge programme will help the Greater Cambridge Partnership understand how people use the road network.
Data will be released monthly for these locations until the end of 2020. Please note that due to the level of insight that can be gained from these sensors, additional sensors in more locations have been installed in Cambridge since the summer of 2019. Some sensors will remain beyond 2020 in strategic locations and the network is expected to grow. Data for those more permanent sites, outside of the Mill Road project will be published here: https://data.cambridgeshireinsight.org.uk/dataset/cambridge-city-smart-s...
Mill Road Bridge was closed for eight weeks from 1 July 2019 for crucial work being carried out to improve rail services. Pedestrians and cyclists will still be able to cross the railway for most of the working time.
A high concentration of sensors were installed for approximately 18 months to gather data before the closure, during the time when there is no vehicle traffic coming over Mill Road Bridge and then after the bridge is re-opened. This has allowed engineers to see the impact of the closure on surrounding roads, including on air quality. Keeping the sensors in place for this long has also allowed teams to make greater comparisons, by taking in to account daily, weekly, monthly and annual variations in traffic levels.
The below data release offers counts for each sensor over 1 hour periods. The curent data covers the period 03/06/2019 to 13/12/2020.
Hourly counts are broken down by inbound and outbound journeys. .
Counts are also broken down by vehicle type. This includes:
Pedestrians Cyclists Buses LGV OGV 1 OGV 2 The release also includes a full list of sensor sites with geographic point location data.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
This competition involves advertisement data provided by BuzzCity Pte. Ltd. BuzzCity is a global mobile advertising network that has millions of consumers around the world on mobile phones and devices. In Q1 2012, over 45 billion ad banners were delivered across the BuzzCity network consisting of more than 10,000 publisher sites which reach an average of over 300 million unique users per month. The number of smartphones active on the network has also grown significantly. Smartphones now account for more than 32% phones that are served advertisements across the BuzzCity network. The "raw" data used in this competition has two types: publisher database and click database, both provided in CSV format. The publisher database records the publisher's (aka partner's) profile and comprises several fields:
publisherid - Unique identifier of a publisher. Bankaccount - Bank account associated with a publisher (may be empty) address - Mailing address of a publisher (obfuscated; may be empty) status - Label of a publisher, which can be the following: "OK" - Publishers whom BuzzCity deems as having healthy traffic (or those who slipped their detection mechanisms) "Observation" - Publishers who may have just started their traffic or their traffic statistics deviates from system wide average. BuzzCity does not have any conclusive stand with these publishers yet "Fraud" - Publishers who are deemed as fraudulent with clear proof. Buzzcity suspends their accounts and their earnings will not be paid
On the other hand, the click database records the click traffics and has several fields:
id - Unique identifier of a particular click numericip - Public IP address of a clicker/visitor deviceua - Phone model used by a clicker/visitor publisherid - Unique identifier of a publisher adscampaignid - Unique identifier of a given advertisement campaign usercountry - Country from which the surfer is clicktime - Timestamp of a given click (in YYYY-MM-DD format) publisherchannel - Publisher's channel type, which can be the following: ad - Adult sites co - Community es - Entertainment and lifestyle gd - Glamour and dating in - Information mc - Mobile content pp - Premium portal se - Search, portal, services referredurl - URL where the ad banners were clicked (obfuscated; may be empty). More details about the HTTP Referer protocol can be found in this article. Related Publication: R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). Detecting click fraud in online advertising: A data mining approach, Journal of Machine Learning Research, 15, 99-140.
This dataset was created by DNS_dataset
AADT represents current (most recent) Annual Average Daily Traffic on sampled road systems. This information is displayed using the Traffic Count Locations Active feature class as of the annual HPMS freeze in January. Historical AADT is found in another table. Please note that updates to this dataset are on an annual basis, therefore the data may not match ground conditions or may not be available for new roadways. Resource Contact: Christy Prentice, Traffic Forecasting & Analysis (TFA), http://www.dot.state.mn.us/tda/contacts.html#TFA
Check other metadata records in this package for more information on Annual Average Daily Traffic Locations Information.
Link to ESRI Feature Service:
Annual Average Daily Traffic Locations in Minnesota: Annual Average Daily Traffic Locations
The Mill Road Sensor Project which monitored the eight week closure of the Mill Road bridge by Govia Thameslink to carry out crucial work to improve rail services in 2019 has now completed. 15 smart sensors were installed on Mill Road and surrounding streets to record numbers of pedestrians, bicycles, cars and other vehicles using the network in this area. During the works, access to motorised traffic was not permitted however pedestrians and cyclists were still able to cross the railway for most of the working time. The data collated and analysed by the Smart Cambridge programme has helped the Greater Cambridge Partnership understand how people use the road network and allowed engineers to see the impact of the closure on surrounding roads, including on air quality (Air quality work was completed by Cambridge City Council and information on this can be found on their website here). Final reports on the learnings from the project, which completed in December 2020, can be found on the Smart Cambridge website here. Data captured by the 15 sensors used during this trial can be found on this page for the period up to and including December 2020. Keeping the sensors in place for this long has also allowed teams to make greater comparisons, by taking in to account daily, weekly, monthly and annual variations in traffic levels. The below data release offers counts for each sensor over 1 hour periods. The current data covers the period 03/06/2019 to 13/12/2020. Hourly counts are broken down by inbound and outbound journeys. . Counts are also broken down by vehicle type. This includes: Pedestrians Cyclists Buses LGV OGV 1 OGV 2 The release also includes a full list of sensor sites with geographic point location data. Data collected by the sensors from 1st January 2021 can be found here and will be updated on a quarterly basis. The Mill Road Project demonstrated the level of insight that can be gained from these sensors, leading to additional sensors in more locations being installed in Cambridge since the summer of 2019. Therefore the data on this page includes both the sensors originally installed for the Mill Road Project and additional sensors deployed at later dates.
https://opendata.vancouver.ca/pages/licence/https://opendata.vancouver.ca/pages/licence/
This dataset contains the locations of intersections with traffic counts and links to collected data. Information on traffic counts is collected by staff at intersections and includes detailed information by lane and direction. Traffic information is also collected by automated counters at mid-block locations and focuses on direction specifically. That is found in separate dataset, Directional traffic count locations. Data currencyThis is a static dataset Data accuracyThe locations are approximate, either in the intersection of two or more streets or along a block between intersections. Websites for further informationTraffic count data
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
You can also access a zipped csv file version of this
dataset.TMS
(traffic monitoring system) daily-updated traffic counts CSVData reuse caveats: as per license.
Data quality
statement: please read the accompanying user manual, explaining:
how
this data is collected identification
of count stations traffic
monitoring technology monitoring
hierarchy and conventions typical
survey specification data
calculation TMS
operation.
Traffic
monitoring for state highways: user manual
[PDF 465 KB]
The data is at daily granularity. However, the actual update
frequency of the data depends on the contract the site falls within. For telemetry
sites it's once a week on a Wednesday. Some regional sites are fortnightly, and
some monthly or quarterly. Some are only 4 weeks a year, with timing depending
on contractors’ programme of work.
Data quality caveats: you must use this data in
conjunction with the user manual and the following caveats.
The
road sensors used in data collection are subject to both technical errors and
environmental interference.Data
is compiled from a variety of sources. Accuracy may vary and the data
should only be used as a guide.As
not all road sections are monitored, a direct calculation of Vehicle
Kilometres Travelled (VKT) for a region is not possible.Data
is sourced from Waka Kotahi New Zealand Transport Agency TMS data.For
sites that use dual loops classification is by length. Vehicles with a length of less than 5.5m are
classed as light vehicles. Vehicles over 11m long are classed as heavy
vehicles. Vehicles between 5.5 and 11m are split 50:50 into light and
heavy.In September 2022, the National Telemetry contract was handed to a new
contractor. During the handover process, due to some missing documents and aged technology, 40 of the 96 national telemetry traffic count sites went offline. Current contractor has continued to upload data from all active sites and have gradually worked to bring most offline sites back online. Please note and account for possible gaps in data from National Telemetry Sites.
The NZTA Vehicle
Classification Relationships diagram below shows the length classification (typically dual loops) and axle classification (typically pneumatic tube counts),
and how these map to the Monetised benefits and costs manual, table A37,
page 254.
Monetised benefits and costs manual [PDF 9 MB]
For the full TMS
classification schema see Appendix A of the traffic counting manual vehicle
classification scheme (NZTA 2011), below.
Traffic monitoring for state highways: user manual [PDF 465 KB]
State highway traffic monitoring (map)
State highway traffic monitoring sites
TMS
(traffic monitoring system) traffic – historic quarter hourly
This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.
Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.
User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.
Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.
GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.
Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.
High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.
Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.
Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Datasets available on the NDW website. For more information please refer directly to their website.
Update, Autumn 2024: We have now published an interactive dashboard which is designed to provide typical average daily flows by month or by site for the purposes of long-term trend monitoring. This approach to data provision will enable users to access data in a more timely fashion, as the dashboard refreshes on a daily basis. The data in this dashboard has also been cleaned to remove 'non-neutral' and erroneous days of data from average flow calculations. Please examine the front page of the dashboard for clarity on what this means. This dashboard is available at the following link: Cambridgeshire & Peterborough Insight – Roads, Transport and Active Travel – Traffic Flows – Traffic Flows Dashboard (cambridgeshireinsight.org.uk) The background: In spring and summer 2019, a series of smart traffic sensors were installed in Cambridge to monitor the impact of the Mill Road bridge closure. These sensors were installed for approximately 18 months in order to gather data before the closure, during the time when there was no vehicle traffic coming over Mill Road Bridge and then after the bridge re-opened. Due to the success of the sensors and the level of insight it is possible to gain, additional sensors have since been installed in more locations across the county. A traffic count sites map showing the locations of the permanent and annually monitored sites across the county, including the Vivacity sensor locations, is available on Cambridgeshire Insight. The data Data from the longer-term Vivacity sensors from 2019-2022 is available to download from the bottom of this page. The Vivacity sensor network grew considerably during 2022 and as a result, manual uploading of the data is no longer feasible. Consideration is currently being given to methods to streamline and/or automate Vivacity data sharing. The data below provides traffic counts at one-hour intervals, broken down into 8 vehicle categories. Data is provided (with caveats – see bottom of page) from the installation of the sensor up to 31/12/2022. The 8 vehicle categories are: 'Car', 'Pedestrian', 'Cyclist', 'Motorbike', 'Bus', 'OGV1', 'OGV2' and 'LGV'. The counts are broken down into inbound (In) and outbound (Out) journeys. Please see the 'Location List' below to establish which compass directions the 'In' and 'Out' are referring to for each sensor, as it differs by location. Some sensors record counts across multiple 'count-lines' which enables the sensor to provide more accurate counts at different points across the road, for example footways, cycle ways and the road. This is particularly useful for picking up pedestrians. Sensors with multiple count lines often present data for the road, the left-hand side footway (LHS) and the right-hand side footway (RHS) respectively. To determine the total flow, simply aggregate the centre, LHS and RHS count-lines. Please note that new countlines have been introduced over time for some sensors so care should be taken to make sure all necessary countlines are included when calculating a total flow. In some locations sensor hardware has been replaced and the sensor number has therefore changed (e.g. the Perne Road sensor was originaly named "16" but was subsequently replaced and renamed "44"). Please refer to the 'Location List' file which details the current and previous sensor numbers at each location. Caveats: 1. Data quality: A Vivacity sensor performance monitoring exercise was undertaken in 2022 to determine the level of accuracy of the Vivacity sensors. The findings of this exercise are documented in a technical note. The note helps to highlight data limitations and provides guidance on how best to work with the Vivacity data. A key finding within the note is that the v1 hardware Vivacity sensors (a small group of older hardware sensors) have been found to struggle to accurately count pedestrians and cyclists. As of December 2022, the only sensors that continue to use v1 hardware are on Milton Road (s13), Coleridge Road (s3), Vinery Road (s4), Coldham's Lane (s7), Devonshire Road cycle bridge (s12) and Hills Road (s14). Full details are provided within the tehcnical note. The note also helps to highlight data limitations and provides guidance on how best to work with the Vivacity data. 2. Data gaps: The sensors are designed to capture data 24 hours per day, 7 days per week however there are occasions when sensors go down and are not able to capture data or only capture partial data that is therefore not representative. The Research Group make every effort to remove data believed to be misleading but this cannot be guaranteed and the user is responsible for sense checking the data and excluding anything considered erroneous prior to use. The Research Group exclude days where very low or zero flows have been recorded for the day. Within the spreadsheets, these rows will simply appear blank when downloaded – indicating that the sensor is live and active during this time, but the output is not deemed reliable enough for publication. 3. British summer time / clocks changing: The data is provided in hourly intervals in the local time zone. When the clocks go forward at the end of March and the clocks go backwards at the end of October there are therefore missing / duplicate hours included within the data. On 27 October 2019, 25 October 2020, and 31 October 2021, all countlines will show two separate values for 1am. This is due to clocks going back at 1am in the morning on these dates. As these days were all 25-hours long we have kept both instances in the data for full transparency. Similarly, all countlines on 29 March 2020, 28 March 2021, and 27 March 2022 will show no values at all for 1-2am. This is due to the clocks going forward by one hour on these dates meaning they were 23-hour days.
The Motor Vehicle Collisions vehicle table contains details on each vehicle involved in the crash. Each row represents a motor vehicle involved in a crash. The data in this table goes back to April 2016 when crash reporting switched to an electronic system.
The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage (https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/ny_overlay_mv-104an_rev05_2004.pdf). It should be noted that the data is preliminary and subject to change when the MV-104AN forms are amended based on revised crash details. Due to success of the CompStat program, NYPD began to ask how to apply the CompStat principles to other problems. Other than homicides, the fatal incidents with which police have the most contact with the public are fatal traffic collisions. Therefore in April 1998, the Department implemented TrafficStat, which uses the CompStat model to work towards improving traffic safety. Police officers complete form MV-104AN for all vehicle collisions. The MV-104AN is a New York State form that has all of the details of a traffic collision. Before implementing Trafficstat, there was no uniform traffic safety data collection procedure for all of the NYPD precincts. Therefore, the Police Department implemented the Traffic Accident Management System (TAMS) in July 1999 in order to collect traffic data in a uniform method across the City. TAMS required the precincts manually enter a few selected MV-104AN fields to collect very basic intersection traffic crash statistics which included the number of accidents, injuries and fatalities. As the years progressed, there grew a need for additional traffic data so that more detailed analyses could be conducted. The Citywide traffic safety initiative, Vision Zero started in the year 2014. Vision Zero further emphasized the need for the collection of more traffic data in order to work towards the Vision Zero goal, which is to eliminate traffic fatalities. Therefore, the Department in March 2016 replaced the TAMS with the new Finest Online Records Management System (FORMS). FORMS enables the police officers to electronically, using a Department cellphone or computer, enter all of the MV-104AN data fields and stores all of the MV-104AN data fields in the Department’s crime data warehouse. Since all of the MV-104AN data fields are now stored for each traffic collision, detailed traffic safety analyses can be conducted as applicable.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.
Our corresponding paper (published at ITSC 2022) is available here.
Further, we have applied 3DHD CityScenes to map deviation detection here.
Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:
The DevKit is available here:
https://github.com/volkswagen/3DHD_devkit.
The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.
When using our dataset, you are welcome to cite:
@INPROCEEDINGS{9921866,
author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and
Fingscheidt, Tim},
booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)},
title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds},
year={2022},
pages={627-634}}
Acknowledgements
We thank the following interns for their exceptional contributions to our work.
The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.
The Dataset
After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.
1. Dataset
This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.
During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.
To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.
import json
json_path = r"E:\3DHD_CityScenes\Dataset\train.json"
with open(json_path) as jf:
data = json.load(jf)
print(data)
2. HD_Map
Map items are stored as lists of items in JSON format. In particular, we provide:
3. HD_Map_MetaData
Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.
Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.
4. HD_PointCloud_Tiles
The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.
After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.
import numpy as np
import pptk
file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin"
pc_dict = {}
key_list = ['x', 'y', 'z', 'intensity', 'is_ground']
type_list = ['
5. Trajectories
We provide 15 real-world trajectories recorded during a measurement campaign covering the whole HD map. Trajectory samples are provided approx. with 30 Hz and are encoded in JSON.
These trajectories were used to provide the samples in train.json, val.json. and test.json with realistic geolocations and orientations of the ego vehicle.
- OP1 – OP5 cover the majority of the map with 5 trajectories.
- RH1 – RH10 cover the majority of the map with 10 trajectories.
Note that OP5 is split into three separate parts, a-c. RH9 is split into two parts, a-b. Moreover, OP4 mostly equals OP1 (thus, we speak of 14 trajectories in our paper). For completeness, however, we provide all recorded trajectories here.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.
Content :
This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.
The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).
Dataset Columns:
No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance
Acknowledgements :
I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.
Ravikumar Gattu , Susmitha Choppadandi
Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).
**Dataset License: ** CC0: Public Domain
Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.
ML techniques benefits from this Dataset :
This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :
Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.
Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.
3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.