34 datasets found

Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Honig, Joshua
Chan-Tin, Eric
Moran, Madeline
Ferrell, Nathan
Homan, Sophia
Soni, Shreena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Data from: AppClassNet - A commercial-grade dataset for application...
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dario rossi (2023). AppClassNet - A commercial-grade dataset for application identification research [Dataset]. http://doi.org/10.6084/m9.figshare.20375580.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20375580.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
dario rossi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AppClassNet is a commercial-grade dataset that represent a realistic benchmark for the use-case of traffic classification and management.

The AppClassNet dataset is complemented by companion artifacts containing baseline code to train and test state of the art baseline models for a quick boostrap.

A description of the dataset, the expected performance of the baseline models, the allowed and forbidden usages of the dataset, and more is available in a companion technical report [1]

[1] https://dl.acm.org/doi/10.1145/3561954.3561958
Z
AIT Alert Data Set
data.niaid.nih.gov
zenodo.org
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landauer, Max (2024). AIT Alert Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8263180
Explore at:
Dataset updated
Oct 14, 2024
Dataset provided by
Skopik, Florian
Landauer, Max
Wurzenberger, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].

The alert data set contains two files for each of the eight scenarios, and a file for their labels:

_aminer.json contains alerts from AMiner IDS

_wazuh.json contains alerts from Wazuh IDS and Suricata IDS

labels.csv contains the start and end times of attack phases in each scenario

Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:

Scans (nmap, WPScan, dirb)

Webshell upload (CVE-2020-24186)

Password cracking (John the Ripper)

Privilege escalation

Remote command execution

Data exfiltration (DNSteal) and stopped service

The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.

Acknowledgements: Partially funded by the European Defence Fund (EDF) projects AInception (101103385) and NEWSROOM (101121403), and the FFG project PRESENT (FO999899544). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them.

If you use the AIT-ADS, please cite the following publications:

[1] Landauer, M., Skopik, F., Wurzenberger, M. (2024): Introducing a New Alert Data Set for Multi-Step Attack Analysis. Proceedings of the 17th Cyber Security Experimentation and Test Workshop. [PDF]

[2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]
R
Traffic Dataset
universe.roboflow.com
zip
Updated Sep 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
joseva (2022). Traffic Dataset [Dataset]. https://universe.roboflow.com/joseva/traffic-kyalq/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Sep 19, 2022
Dataset authored and provided by
joseva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Traffic Signals Bounding Boxes
Description
Here are a few use cases for this project:

Self-Driving Vehicles System: This model can be implemented in autonomous vehicles technology to identify traffic signs and signals, thus enabling the vehicle to make intelligent and safety-compliant decisions as per road conditions.

Smart Traffic Management: The model can be used in urban planning and traffic management systems to analyze, comprehend, and report traffic indications in real-time, aiding in better road traffic control and congestion avoidance.

Driving Assistance Applications: There is potential to integrate this model into GPS navigation systems or dedicated driving assistance applications. These apps could provide real-time traffic rule alerts to drivers, enhancing safety and rule adherence.

Road Condition Analysis: Use the model to collect road condition data based on signs for construction, slippery road, uneven road, etc. This critical information could support road maintenance planning by relevant authorities.

Traffic Rule Training Software: This model can be used in developing training software for beginner drivers or trucking companies. The software could explain and demonstrate various traffic rules, greatly improving the quality of road safety education.
R
Traffic_train Dataset
universe.roboflow.com
zip
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ViWhiVN (2021). Traffic_train Dataset [Dataset]. https://universe.roboflow.com/viwhivn/traffic_train
Explore at:
zipAvailable download formats
Dataset updated
Apr 12, 2021
Dataset authored and provided by
ViWhiVN
Variables measured
TrafficTrain Bounding Boxes
Description
Here are a few use cases for this project:

Traffic Monitoring Systems: The model could be leveraged by city planning departments or traffic control centers to automatically identify and monitor different types of traffic on roads in real-time. This could assist in efficient traffic management, congestion detection, and traffic light timing adjustment.

Autonomous Vehicles: Companies developing self-driving cars or drones could utilize the model to improve their vehicle's ability to recognize different types of vehicles on the road, ensuring safer navigation.

Security and Surveillance: The model could be used in CCTV camera systems to detect, classify, and track vehicles around sensitive areas like government buildings, airports, or high-security areas for security enhancement and crime prevention.

Traffic Analysis for Urban Planning: Urban planners and researchers can use the model to study traffic patterns based on vehicle type over time, informing future infrastructure and transportation planning.

Enhanced Vehicle-based Augmented Reality (AR): Game developers or AR app creators who focus on city or traffic scenarios can use the model to enhance their system's ability to accurately detect and interact with real-world vehicles, promoting a more immersive experience for users.
m
USA Mobility & Foot traffic Enriched Data by Predik Data-Driven
app.mobito.io
Updated Feb 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). USA Mobility & Foot traffic Enriched Data by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-mobility-&-foot-traffic-enriched-data-by-predik-data-driven
Explore at:
Dataset updated
Feb 3, 2023
Area covered
United States
Description
This Mobility & Foot traffic dataset includes enriched mobility data and visitation at POIs to answer questions such as: -How often do people visit a location? (daily, monthly, absolute, and averages). -What type of places do they visit? (parks, schools, hospitals, etc) -Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors. -What's their mobility like during night hours & day hours?
-What's the frequency of the visits by day of the week and hour of the day? Extra insights -Visitors´ relative Income Level. -Visitors´ preferences as derived from their visits to shopping, parks, sports facilities, and churches, among others. - Footfall measurement in all types of establishments (shopping malls, stand-alone stores, etc). -Visitors´ preferences as derived from their visits to shopping, parks, sports facilities, and churches, among others. - Origin/Destiny matrix. - Vehicular traffic, measurement of speed, types of vehicles, among other insights. Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time, and at a particular lat and long. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws. We clean, process and enrich these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different tailor-made solutions for companies and also data science and machine learning applications, especially those related to understanding customer behavior. Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations. Night base of the device: we calculate the approximate location of where the device spends the night, which is usually its home neighborhood. Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location. Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income. POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries. Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Delivery schemas We can deliver the data in three different formats: Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets. Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, and characterize and understand the consumer's behavior. Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
d
Datasets for Computational Methods and GIS Applications in Social Science
search.dataone.org
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahui Wang; Lingbo Liu (2024). Datasets for Computational Methods and GIS Applications in Social Science [Dataset]. http://doi.org/10.7910/DVN/4CM7V4
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/4CM7V4
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Fahui Wang; Lingbo Liu
Description
Dataset for the textbook Computational Methods and GIS Applications in Social Science (3rd Edition), 2023 Fahui Wang, Lingbo Liu Main Book Citation: Wang, F., & Liu, L. (2023). Computational Methods and GIS Applications in Social Science (3rd ed.). CRC Press. https://doi.org/10.1201/9781003292302 KNIME Lab Manual Citation: Liu, L., & Wang, F. (2023). Computational Methods and GIS Applications in Social Science - Lab Manual. CRC Press. https://doi.org/10.1201/9781003304357 KNIME Hub Dataset and Workflow for Computational Methods and GIS Applications in Social Science-Lab Manual Update Log If Python package not found in Package Management, use ArcGIS Pro's Python Command Prompt to install them, e.g., conda install -c conda-forge python-igraph leidenalg NetworkCommDetPro in CMGIS-V3-Tools was updated on July 10,2024 Add spatial adjacency table into Florida on June 29,2024 The dataset and tool for ABM Crime Simulation were updated on August 3, 2023, The toolkits in CMGIS-V3-Tools was updated on August 3rd,2023. Report Issues on GitHub https://github.com/UrbanGISer/Computational-Methods-and-GIS-Applications-in-Social-Science Following the website of Fahui Wang : http://faculty.lsu.edu/fahui Contents Chapter 1. Getting Started with ArcGIS: Data Management and Basic Spatial Analysis Tools Case Study 1: Mapping and Analyzing Population Density Pattern in Baton Rouge, Louisiana Chapter 2. Measuring Distance and Travel Time and Analyzing Distance Decay Behavior Case Study 2A: Estimating Drive Time and Transit Time in Baton Rouge, Louisiana Case Study 2B: Analyzing Distance Decay Behavior for Hospitalization in Florida Chapter 3. Spatial Smoothing and Spatial Interpolation Case Study 3A: Mapping Place Names in Guangxi, China Case Study 3B: Area-Based Interpolations of Population in Baton Rouge, Louisiana Case Study 3C: Detecting Spatiotemporal Crime Hotspots in Baton Rouge, Louisiana Chapter 4. Delineating Functional Regions and Applications in Health Geography Case Study 4A: Defining Service Areas of Acute Hospitals in Baton Rouge, Louisiana Case Study 4B: Automated Delineation of Hospital Service Areas in Florida Chapter 5. GIS-Based Measures of Spatial Accessibility and Application in Examining Healthcare Disparity Case Study 5: Measuring Accessibility of Primary Care Physicians in Baton Rouge Chapter 6. Function Fittings by Regressions and Application in Analyzing Urban Density Patterns Case Study 6: Analyzing Population Density Patterns in Chicago Urban Area >Chapter 7. Principal Components, Factor and Cluster Analyses and Application in Social Area Analysis Case Study 7: Social Area Analysis in Beijing Chapter 8. Spatial Statistics and Applications in Cultural and Crime Geography Case Study 8A: Spatial Distribution and Clusters of Place Names in Yunnan, China Case Study 8B: Detecting Colocation Between Crime Incidents and Facilities Case Study 8C: Spatial Cluster and Regression Analyses of Homicide Patterns in Chicago Chapter 9. Regionalization Methods and Application in Analysis of Cancer Data Case Study 9: Constructing Geographical Areas for Mapping Cancer Rates in Louisiana Chapter 10. System of Linear Equations and Application of Garin-Lowry in Simulating Urban Population and Employment Patterns Case Study 10: Simulating Population and Service Employment Distributions in a Hypothetical City Chapter 11. Linear and Quadratic Programming and Applications in Examining Wasteful Commuting and Allocating Healthcare Providers Case Study 11A: Measuring Wasteful Commuting in Columbus, Ohio Case Study 11B: Location-Allocation Analysis of Hospitals in Rural China Chapter 12. Monte Carlo Method and Applications in Urban Population and Traffic Simulations Case Study 12A. Examining Zonal Effect on Urban Population Density Functions in Chicago by Monte Carlo Simulation Case Study 12B: Monte Carlo-Based Traffic Simulation in Baton Rouge, Louisiana Chapter 13. Agent-Based Model and Application in Crime Simulation Case Study 13: Agent-Based Crime Simulation in Baton Rouge, Louisiana Chapter 14. Spatiotemporal Big Data Analytics and Application in Urban Studies Case Study 14A: Exploring Taxi Trajectory in ArcGIS Case Study 14B: Identifying High Traffic Corridors and Destinations in Shanghai Dataset File Structure 1 BatonRouge Census.gdb BR.gdb 2A BatonRouge BR_Road.gdb Hosp_Address.csv TransitNetworkTemplate.xml BR_GTFS Google API Pro.tbx 2B Florida FL_HSA.gdb R_ArcGIS_Tools.tbx (RegressionR) 3A China_GX GX.gdb 3B BatonRouge BR.gdb 3C BatonRouge BRcrime R_ArcGIS_Tools.tbx (STKDE) 4A BatonRouge BRRoad.gdb 4B Florida FL_HSA.gdb HSA Delineation Pro.tbx Huff Model Pro.tbx FLplgnAdjAppend.csv 5 BRMSA BRMSA.gdb Accessibility Pro.tbx 6 Chicago ChiUrArea.gdb R_ArcGIS_Tools.tbx (RegressionR) 7 Beijing BJSA.gdb bjattr.csv R_ArcGIS_Tools.tbx (PCAandFA, BasicClustering) 8A Yunnan YN.gdb R_ArcGIS_Tools.tbx (SaTScanR) 8B Jiangsu JS.gdb 8C Chicago ChiCity.gdb cityattr.csv ...
Z
LoRaWAN Traffic Analysis Dataset
data.niaid.nih.gov
Updated Aug 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kral, Jan (2023). LoRaWAN Traffic Analysis Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7919212
Explore at:
Dataset updated
Aug 28, 2023
Dataset provided by
Kral, Jan
Povalac, Ales
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created by a LoRaWAN sniffer and contains packets, which are thoroughly analyzed in the paper Exploring LoRaWAN Traffic: In-Depth Analysis of IoT Network Communications. Data from the LoRaWAN sniffer was collected in four cities: Liege (Belgium), Graz (Austria), Vienna (Austria), and Brno (Czechia).

Gateway ID: b827ebafac000001

Uplink reception (end-device => gateway)

Only packets containing CRC, inverted IQ

RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF

RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

Gateway ID: b827ebafac000002

Downlink reception (gateway => end-device)

Includes packets without CRC, non-inverted IQ

RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF

RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

Gateway ID: b827ebafac000003

Downlink reception (gateway => end-device) and Class-B beacon on 869.525 MHz

Includes packets without CRC, non-inverted IQ

RX0: 869.525 MHz - BW 125 kHz and all SF, BW 125 kHz and SF9 with implicit header, CR 4/5 and length 17 B

To open the pcap files, you need Wireshark with current support for LoRaTap and LoRaWAN protocols. This support will be available in the official 4.1.0 release. A working version for Windows is accessible in the automated build system.

The source data is available in the log.zip file, which contains the complete dataset obtained by the sniffer. A set of conversion tools for log processing is available on Github. The converted logs, available in Wireshark format, are stored in pcap.zip. For the LoRaWAN decoder, you can use the attached root and session keys. The processed outputs are stored in csv.zip, and graphical statistics are available in png.zip.

This data represents a unique, geographically identifiable selection from the full log, cleaned of any errors. The records from Brno include communication between the gateway and a node with known keys.

Test file :: 00_Test

short test file for parser verification

comparison of LoRaTap version 0 and version 1 formats

Brno, Czech Republic :: 01_Brno

49.22685N, 16.57536E, ASL 306m

lines 150873 to 529796

time 1.8.2022 15:04:28 to 17.8.2022 13:05:32

preliminary experiment

experimental device

Device EUI: 70b3d5cee0000042

Application key: d494d49a7b4053302bdcf96f1defa65a

Device address: 00d85395

Network session key: c417540b8b2afad8930c82fcf7ea54bb

Application session key: 421fea9bedd2cc497f63303edf5adf8e

Liege, Belgium :: 02_Liege :: evaluated in the paper

50.66445N, 5.59276E, ASL 151m

lines 636205 to 886868

time 25.8.2022 10:12:24 to 12.9.2022 06:20:48

Brno, Czech Republic :: 03_Brno_join

49.22685N, 16.57536E, ASL 306m

lines 947787 to 979382

time 30.9.2022 15:21:27 to 4.10.2022 10:46:31

record contains OTAA activation (Join Request / Join Accept)

experimental device:

Device EUI: 70b3d5cee0000042

Application key: d494d49a7b4053302bdcf96f1defa65a

Device address: 01e65ddc

Network session key: e2898779a03de59e2317b149abf00238

Application session key: 59ca1ac91922887093bc7b236bd1b07f

Graz, Austria :: 04_Graz :: evaluated in the paper

47.07049N, 15.44506E, ASL 364m

lines 1015139 to 1178855

time 26.10.2022 06:21:07 to 29.11.2022 10:03:00

Vienna, Austria :: 05_Wien :: evaluated in the paper

48.19666N, 16.37101E, ASL 204m

lines 1179308 to 3657105

time 1.12.2022 10:42:19 to 4.1.2023 14:00:05

contains a total of 14 short restarts (under 90 seconds)

Brno, Czech Republic :: 07_Brno :: evaluated in the paper

49.22685N, 16.57536E, ASL 306m

lines 4969648 to 6919392

time 16.2.2023 8:53:43 to 30.3.2023 9:00:11
m
LATAM Mobility & Foot traffic Enriched Data by Predik Data-Driven
app.mobito.io
Updated Feb 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). LATAM Mobility & Foot traffic Enriched Data by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/latam-mobility-&-foot-traffic-enriched-data-by-predik-data-driven
Explore at:
Dataset updated
Feb 6, 2023
Area covered
Latin America, SOUTH_AMERICA, NORTH_AMERICA
Description
This Mobility & Foot traffic dataset includes enriched mobility data and visitation at POIs to answer questions such as: -How often do people visit a location? (daily, monthly, absolute, and averages). -What type of places do they visit? (parks, schools, hospitals, etc). -Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors. -What's their mobility like during night hours & day hours?
-What's the frequency of the visits by day of the week and hour of the day? Extra insights -Visitors´ relative Income Level. - Footfall measurement in all types of establishments (shopping malls, stand-alone stores, etc). -Visitors´ preferences as derived from their visits to shopping, parks, sports facilities, and churches, among others. - Origin/Destiny matrix. - Vehicular traffic, measurement of speed, types of vehicles, among other insights. Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time, and at a particular lat and long. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with GDPR and all applicable privacy laws. We clean, process, and enrich these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different tailor-made solutions for companies and also data science and machine learning applications, especially those related to understanding customer behavior. Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations. Night base of the device: we calculate the approximate location of where the device spends the night, which is usually its home neighborhood. Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location. Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income. POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries. Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Delivery schemas We can deliver the data in three different formats: Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets. Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, and characterize and understand the consumer's behavior. Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
a
Traffic Count Viewer
opendatacle-clevelandgis.hub.arcgis.com
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cleveland | GIS (2023). Traffic Count Viewer [Dataset]. https://opendatacle-clevelandgis.hub.arcgis.com/datasets/traffic-count-viewer
Explore at:
Dataset updated
Jun 14, 2023
Dataset authored and provided by
Cleveland | GIS
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This application provides an interactive experience to look up traffic count reports across the City of Cleveland. Traffic count reports are conducted using unmanned vehicle counter devices that detect the volume and speed of vehicular traffic.InstructionsEach point represents a single traffic count observation that was conducted since 2019.Zoom into a point, click on it to generate a pop-up that presents summary statistics and a PDF link for each report.Use Filter or Search to narrow down to your area or time of interest.Data GlossarySee: Cleveland Traffic Count Reports - Overview (arcgis.com)Update FrequencyMonthly, at the end of each monthThis application uses the following dataset(s):Cleveland Traffic Count ReportsContactsCity Planning Commission
c
Cleveland Traffic Count Reports
data.clevelandohio.gov
hub.arcgis.com
Updated May 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cleveland | GIS (2022). Cleveland Traffic Count Reports [Dataset]. https://data.clevelandohio.gov/datasets/ClevelandGIS::cleveland-traffic-count-reports/about
Explore at:
Dataset updated
May 19, 2022
Dataset authored and provided by
Cleveland | GIS
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered

Description
This traffic count layer is a collection of speed and traffic observations conducted by the Division of Streets, Public Works, City of Cleveland and processed the City Planning department. The resulting report documents are PDFs that are attached to each point location in this layer. They include the coordinates of the traffic counting device, and various summaries of vehicle counts and clocked speeds in multiple directions.This layer includes counts from 2019 onwards with an attachment related to each location. Speed values are preliminary for the data popup; consult the report PDF for authoritative results of the count.Documentation and DefinitionsClick Here For Table and FieldsUpdate FrequencyMonthly, at the end of each monthThis dataset is featured on the following app(s):Transportation Data ViewerContactsPlease note: Traffic counts presented are preliminary information from count devices, and may have errors. Issues or questions regarding count data can be submitted to Cleveland's Division of Traffic Engineering
O
QLDTraffic GeoJSON API
data.qld.gov.au
researchdata.edu.au
+1more
html, pdf
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Transport and Main Roads (2024). QLDTraffic GeoJSON API [Dataset]. https://www.data.qld.gov.au/dataset/131940-traffic-and-travel-information-geojson-api
Explore at:
html(1 KiB), pdf(805 KiB)Available download formats
Dataset updated
Nov 18, 2024
Dataset authored and provided by
Transport and Main Roads
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traffic and road condition information captured in the QLDTraffic system is available for use by external developers via GeoJSON feeds.

These feeds cover Hazards, Crashes, Congestion, Flooding, Roadworks and Special Events and Web Cameras details.

The information provided in these feeds is on an 'as is' basis. Additional details about the available information on the QLDTraffic site can be viewed on our disclaimer page .

Transport and Main Roads (TMR) is constantly striving to ensure that they provide accurate, reliable and timely traffic and road condition information.

Given this, the previously available API has undergone changes in August 2016 with the creation of the QLDTraffic public website.

Details on accessing QLDTraffic’s traffic and road condition information via GeoJSON can be found in the QLDTraffic website application programming interface (API) specification.

This API has been developed to allow improved integration of QLDTraffic traffic and road condition information with external systems.

Please be aware that this specification may be subject to change.
R
Data from: Final Final Dataset
universe.roboflow.com
zip
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
redneuro (2023). Final Final Dataset [Dataset]. https://universe.roboflow.com/redneuro/final-final-kqkol/model/2
Explore at:
zipAvailable download formats
Dataset updated
Jun 13, 2023
Dataset authored and provided by
redneuro
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Trafic Signals Bounding Boxes
Description
Here are a few use cases for this project:

Traffic Instruction Learning Application: This model could be used for developing an application that helps people study and learn traffic signals for driving exams. Users can simply upload a photo of traffic signal, and the model can identify it, providing information about the rules associated with each signal.

Autonomous Vehicle Systems: The model could be integrated into autonomous driving systems to enhance understanding of visual cues on the road. The system could then make proper navigational decisions based on the recognized signs.

Traffic Management Systems: Traffic management authorities can use this model to monitor traffic signals in their city. They can detect potential issues or malfunctions when incorrect signals are identified.

Virtual Reality Driving Simulators: These simulators can use this model to accurately represent real-world driving scenarios. They can use the model for generating accurate and diverse traffic signals, contributing to a more realistic driving experience.

Safety Assessments: The model can be used for performing safety audits of cities, identifying areas where either important road signs are missing, or incorrect signs are placed. This can help reduce the chances of accidents due to misinterpretation of signals.

Please note that the example image mentioned (room with chairs and table) is unrelated to traffic signals, and therefore, it seems that the data set might contain unrelated images. For better performance, a dataset consistent with traffic signs should be used.
f
The t-Test results of speed series under different grades.
plos.figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xinqiang Chen; Zhibin Li; Yinhai Wang; Zhiyong Cui; Chaojian Shi; Huafeng Wu (2023). The t-Test results of speed series under different grades. [Dataset]. http://doi.org/10.1371/journal.pone.0184142.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0184142.t003
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS ONE
Authors
Xinqiang Chen; Zhibin Li; Yinhai Wang; Zhiyong Cui; Chaojian Shi; Huafeng Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The t-Test results of speed series under different grades.
m
Ransomware and user samples for training and validating ML models
data.mendeley.com
Updated Sep 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Berrueta (2021). Ransomware and user samples for training and validating ML models [Dataset]. http://doi.org/10.17632/yhg5wk39kf.2
Explore at:
Unique identifier
https://doi.org/10.17632/yhg5wk39kf.2
Dataset updated
Sep 17, 2021
Authors
Eduardo Berrueta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ransomware is considered as a significant threat for most enterprises since past few years. In scenarios wherein users can access all files on a shared server, one infected host is capable of locking the access to all shared files. In the article related to this repository, we detect ransomware infection based on file-sharing traffic analysis, even in the case of encrypted traffic. We compare three machine learning models and choose the best for validation. We train and test the detection model using more than 70 ransomware binaries from 26 different families and more than 2500 h of ‘not infected’ traffic from real users. The results reveal that the proposed tool can detect all ransomware binaries, including those not used in the training phase (zero-days). This paper provides a validation of the algorithm by studying the false positive rate and the amount of information from user files that the ransomware could encrypt before being detected.

This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.

The folders are named NxSy where x is the number of 1-second interval per sample and y the sliding step in seconds.

Each folder (for example N10S10/) contains: - tree.py -> Python script with the Tree model. - ensemble.json -> JSON file with the information about the Ensemble model. - NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3). - N10S10.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application. - zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application. - userSamples_test -> All samples used for validating each model in this folder. It is in csv format for using in bigML application. - userSamples_train -> User samples used for training the models. - ransomware_train -> Ransomware samples used for training the models - scaler.scaler -> Standard Scaler from python library used for scale the samples. - zeroDays_notFiltered -> Folder with the zeroDay samples.

In the case of N30S30 folder, there is an additional folder (SMBv2SMBv3NFS) with the samples extracted from the SMBv2, SMBv3 and NFS traffic traces. There are more binaries than the ones presented in the article, but it is because some of them are not "unseen" binaries (the families are present in the training set).

The files containing samples (NxSy.csv, zeroDays.csv and userSamples_test.csv) are structured as follows: - Each line is one sample. - Each sample has 3*T features and the label (1 if it is 'infected' sample and 0 if it is not). - The features are separated by ',' because it is a csv file. - The last column is the label of the sample.

Additionally we have placed two pcap files in root directory. There are the traces used for compare both versions of SMB.
Traffic Crash Data
data.milwaukee.gov
csv
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milwaukee Police Department (2025). Traffic Crash Data [Dataset]. https://data.milwaukee.gov/dataset/traffic_crash
Explore at:
csv(122571597)Available download formats
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Milwaukee Police Departmenthttp://city.milwaukee.gov/police
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Update Frequency: Daily

This data-set includes traffic crash information including case number, accident date and the location.

Reportable crash reports can take up to 10 business days to appear after the date of the crash if there are no issues with the report.

If you cannot find your crash report after 10 business days, please call the Milwaukee Police Department Open Records Section at (414) 935-7435 for further assistance.

Non-reportable crash reports can only be obtained by contacting the Open Records Section and will not show up in a search on this site. A non-reportable crash is any accident that does not:

1) result in injury or death to any person

2) damage government-owned non-vehicle property to an apparent extent of $200 or more

3) result in total damage to property owned by any one person to an apparent extent of $1000 or more.

All MV4000 crash reports, completed by MPD officers, will be available from the Wisconsin Department of Transportation (WisDOT) Division of Motor Vehicles (DMV) Accident Records Unit, generally 10 days after the incident.

Online Request: Request your Crash Report online at WisDOT-DMV website, https://app.wi.gov/crashreports.

Mail: Wisconsin Department of Transportation Crash Records Unit P.O. Box 7919 Madison, WI 53707-7919

Phone: (608) 266-8753

To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.
Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...
zenodo.org
data.niaid.nih.gov
bin, pdf
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt; Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. http://doi.org/10.5281/zenodo.7085090
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7085090
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt; Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

Our corresponding paper (published at ITSC 2022) is available here.
Further, we have applied 3DHD CityScenes to map deviation detection here.

Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

Python tools to read, generate, and visualize the dataset,

3DHDNet deep learning pipeline (training, inference, evaluation) for
map deviation detection and 3D object detection.

The DevKit is available here:

https://github.com/volkswagen/3DHD_devkit.

The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

When using our dataset, you are welcome to cite:

@INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

Acknowledgements

We thank the following interns for their exceptional contributions to our work.

Benjamin Sertolli: Major contributions to our DevKit during his master thesis

Niels Maier: Measurement campaign for data collection and data preparation

The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

The Dataset

After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

1. Dataset

This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

import json json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

2. HD_Map

Map items are stored as lists of items in JSON format. In particular, we provide:

traffic signs,

traffic lights,

pole-like objects,

construction site locations,

construction site obstacles (point-like such as cones, and line-like such as fences),

line-shaped markings (solid, dashed, etc.),

polygon-shaped markings (arrows, stop lines, symbols, etc.),

lanes (ordinary and temporary),

relations between elements (only for construction sites, e.g., sign to lane association).

3. HD_Map_MetaData

Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

4. HD_PointCloud_Tiles

The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

x-coordinates: 4 byte integer

y-coordinates: 4 byte integer

z-coordinates: 4 byte integer

intensity of reflected beams: 2 byte unsigned integer

ground classification flag: 1 byte unsigned integer

After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

import numpy as np import pptk file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['
5. Trajectories We provide 15 real-world trajectories recorded during a measurement campaign covering the whole HD map. Trajectory samples are provided approx. with 30 Hz and are encoded in JSON. These trajectories were used to provide the samples in train.json, val.json. and test.json with realistic geolocations and orientations of the ego vehicle. OP1 – OP5 cover the majority of the map with 5 trajectories. RH1 – RH10 cover the majority of the map with 10 trajectories. Note that OP5 is split into three separate parts, a-c. RH9 is split into two parts, a-b. Moreover, OP4 mostly equals OP1 (thus, we speak of 14 trajectories in our paper). For completeness, however, we provide all recorded trajectories here.
R
Frenchtrafficsign Dataset
universe.roboflow.com
zip
Updated Jan 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clem UTC (2022). Frenchtrafficsign Dataset [Dataset]. https://universe.roboflow.com/clem-utc/frenchtrafficsign
Explore at:
zipAvailable download formats
Dataset updated
Jan 18, 2022
Dataset authored and provided by
Clem UTC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Traffic Sign Bounding Boxes
Description
Here are a few use cases for this project:

Traffic Sign Detector for Autonomous Vehicles: The "FrenchTrafficSign" model can be used in autonomous driving software to identify and interpret French traffic signs, allowing the vehicle to understand and obey local traffic rules.

Navigation and Map Creation: The model could be used to improve the quality of mapping data by spotting and noting the locations of various traffic signs. The software can identify different traffic signs from images captured by satellites or vehicles, enhancing the safety features of GPS navigation systems.

Advanced Driver-Assistance Systems (ADAS): This model can be integrated into ADAS applications to alert the driver about upcoming traffic signs, helping prevent traffic rules violation and enhancing road safety.

Traffic Infrastructure Audit: The model can be used by city or county public works departments to automatically audit the presence and condition of traffic signs across the urban landscape and schedule maintenance or replacements.

Traffic Rules Learning Apps: The "FrenchTrafficSign" model can be employed in educational apps or driving test preparation apps designed to help learners familiarize themselves with French traffic signs.
Z
Brainport, Platooning, formation improvement by traffic light data
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NEVS (National Electric Vehicle Sweden) (2020). Brainport, Platooning, formation improvement by traffic light data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3606607
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
TNO
TASS
NXP
NEVS (National Electric Vehicle Sweden)
Technolution
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scenario description:

Platoon formation with live traffic light data included in planner. - Enabled live traffic light data included in planner - Not using the Android app - Starting at default locations - This test was filmed, including the GUI.

Session description:

Platoon formation improvement by traffic light data.

Datasets descriptions:

AUTOPILOT_BrainPort_Platooning_DriverVehicleInteraction: Data extracted from the CAN of the vehicle

This dataset contains e.g. throttlestatus, clutchstatus, brakestatus, brakeforce, wipersstatus, steeringwheel for the vehicle

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsAbsolute: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with absolute coordinates

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsRelative: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with relative coordinates

AUTOPILOT_BrainPort_Platooning_IotVehicleMessage: Data sent between all devices, vehicles and services

Each sensor data submission is a Message. A Message has an Envelope, a Path, and optionally (but likely) Path Events and optionally Path Media. The envelope bears fundamental information about the individual sender (the vehicle) but not to a level that owner of the vehicle can be identified or different messages can be identified that originate from a single vehicle.

AUTOPILOT_BrainPort_Platooning_PlatoonFormation: Data sent from PlatoonService to vehicle

This dataset contains information about the route and speed for a specific vehicle for forming a platoon

AUTOPILOT_BrainPort_Platooning_PlatooningAction: Data logged by vehicle

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PlatooningEvent: Data logged by vehicle

This dataset contains information about the identifiers used for each specific platooning event

AUTOPILOT_BrainPort_Platooning_PlatoonStatus: Data sent by vehicle to PlatoonService

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PositioningSystem: Data from GPS on the vehicle

This dataset contains speed, longitude, latitude, heading from the GPS

AUTOPILOT_BrainPort_Platooning_PositioningSystemResample: Data from GPS on the vehicle

This dataset contains speed,longitude,latitude,heading from the GPS, resampled to 100 milliseconds

AUTOPILOT_BrainPort_Platooning_PSInfo: Data sent by PlatoonService to the vehicle

This dataset contains speed and route information for the vehicle to create a platoon

AUTOPILOT_BrainPort_Platooning_Target: Data from sensors on the vehicle

Target detection in the vicinity of the host vehicle, by a vehicle sensor or virtual sensor

AUTOPILOT_BrainPort_Platooning_Vehicle: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o temperature and battery state of the vehicles

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors
C
Bicycle traffic volumes (DB Rad+) Hamburg
ckan.mobidatalab.eu
gml, html, oaf, xsd
Updated Jan 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HMDKLGV (2023). Bicycle traffic volumes (DB Rad+) Hamburg [Dataset]. https://ckan.mobidatalab.eu/dataset/cyclist-volume-db-rad-hamburg98309
Explore at:
html(91771), gml(107432676), xsd(2559), oaf(3752)Available download formats
Dataset updated
Jan 18, 2023
Dataset provided by
HMDKLGV
License
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Description
This dataset contains the volume of bicycle traffic and the average bicycle traffic speeds driven, which are recorded with the help of the DB Rad+ app in the Hamburg road network. The data is only used with the consent of the respective app user. The accumulated bicycle traffic volumes per year (in the case of a year that has just started up to the previous day) and the accumulated bicycle traffic volumes for the last seven days are provided for each road section. In the first year (2022), additional areas of the Hamburg road network were gradually added over the year, so that there is still no reliable overall picture for the entire city for 2022. Only those road sections are shown on which eight or more journeys took place in the period under consideration. The network basis on which the data is projected comes from OpenStreetMaps. The data was processed for Hamburg, in particular for display in the municipal geoportals. The data is primarily used for a qualitative assessment of which roads are used by bicycle traffic and how much and whether there have been changes/shifts over the years, e.g. because bicycle traffic facilities have been renovated or newly built. The absolute figures, on the other hand, are not very meaningful because they depend largely on the number of users of the DB Rad+ app. It should also be noted that the users of the DB Rad+ app and thus the routes used are not necessarily representative of the total population and cycling traffic in the entire city.

Facebook

Twitter

Click to copy link

Link copied

Cite

Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:

Dataset updated

Jun 12, 2024

Dataset provided by

Honig, Joshua
Chan-Tin, Eric
Moran, Madeline
Ferrell, Nathan
Homan, Sophia
Soni, Shreena

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Clear search

Close search

Google apps

Main menu

Network Traffic Analysis: Data and Code

Data from: AppClassNet - A commercial-grade dataset for application...

AIT Alert Data Set

Traffic Dataset

Traffic_train Dataset

USA Mobility & Foot traffic Enriched Data by Predik Data-Driven

Datasets for Computational Methods and GIS Applications in Social Science

LoRaWAN Traffic Analysis Dataset

LATAM Mobility & Foot traffic Enriched Data by Predik Data-Driven

Traffic Count Viewer

Cleveland Traffic Count Reports

QLDTraffic GeoJSON API

Data from: Final Final Dataset

The t-Test results of speed series under different grades.

Ransomware and user samples for training and validating ML models

Traffic Crash Data

Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...

Frenchtrafficsign Dataset

Brainport, Platooning, formation improvement by traffic light data

Bicycle traffic volumes (DB Rad+) Hamburg

Network Traffic Analysis: Data and CodeSee More Versions

Network Traffic Analysis: Data and Code