64 datasets found
  1. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  2. S

    Traffic Route Stats

    • splitgraph.com
    • data.act.gov.au
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data-act-gov-au (2023). Traffic Route Stats [Dataset]. https://www.splitgraph.com/data-act-gov-au/traffic-route-stats-mgzi-6f8j
    Explore at:
    json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
    Dataset updated
    Dec 20, 2023
    Authors
    data-act-gov-au
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains historic Route Definitions and Statistics with Geometry of traffic flow. The detailed documentation is included at

    https://www.data.act.gov.au/dataset/realtime-traffic/cjkg-rvmu.

    Disclaimer : Even though the real-time API updates the info every 30 seconds, we only sample at every 5 minutes for historical archiving

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  3. r

    Round Rock Traffic Counts Viewer by Year

    • geohub.roundrocktexas.gov
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Round Rock (2022). Round Rock Traffic Counts Viewer by Year [Dataset]. https://geohub.roundrocktexas.gov/datasets/round-rock-traffic-counts-viewer-by-year
    Explore at:
    Dataset updated
    Jun 16, 2022
    Dataset authored and provided by
    City of Round Rock
    Area covered
    Round Rock
    Description

    This web app contains the data for the traffic counts for the years 2016 through 2022 for the Transportation department in the City of Round Rock, located in Williamson County, Texas. This layer is part of an original dataset provided and maintained by the City of Round Rock GIS/IT Department and the Transportation Department.The data in this layer are represented as points and polygons.The web map connected to this web app can be found on Round Rock Traffic CountsThis time enabled map shows the traffic counts for for traffic zones within the city of Round Rock. The time sliding application within this map allows us to see the traffic counts by AADT each year between 2016 and 2022.

  4. U.S. Vessel Traffic App

    • ocean-and-coasts-information-system-esrioceans.hub.arcgis.com
    • oceans-esrioceans.hub.arcgis.com
    Updated Apr 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2021). U.S. Vessel Traffic App [Dataset]. https://ocean-and-coasts-information-system-esrioceans.hub.arcgis.com/datasets/esri::u-s-vessel-traffic-app
    Explore at:
    Dataset updated
    Apr 8, 2021
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    The U.S. Vessel Traffic application is a web-based visualization and data-access utility created by Esri. Explore U.S. maritime activity, look for patterns, and download manageable subsets of this massive data set. Vessel traffic data are an invaluable resource made available to our community by the US Coast Guard, NOAA and BOEM through Marine Cadastre. This information can help marine spatial planners better understand users of ocean space and identify potential space-use conflicts. To download this data for your own analysis, explore the Download Options, navigate to a NOAA Electronic Navigation Chart area of interest, and make your selection. This data was sourced from the Automatic Identification System (AIS) provided by USCG, NOAA, and BOEM through Marine Cadastre and aggregated for visualization and sharing in ArcGIS Pro. This application was built with the ArcGIS API for JavaScript. Access this data as an ArcGIS Online collection here. Learn more about AIS tracking here. Find more ocean and maritime resources in Living Atlas. Inquiries can be sent to Keith VanGraafeiland.

  5. d

    Datasys | Clickstream Data (500M+ daily events | global coverage | updated...

    • datarade.ai
    .json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasys, Datasys | Clickstream Data (500M+ daily events | global coverage | updated daily) [Dataset]. https://datarade.ai/data-products/datastream-clickstream-browser-data-feed-datasys
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    Datasys
    Area covered
    Malaysia, Mongolia, Cuba, Aruba, United States of America, Guadeloupe, Cambodia, Argentina, Vietnam, Kyrgyzstan
    Description

    Our clickstream data offers unparalleled access to a vast array of global datasets, capturing user interactions across websites, apps, and digital platforms worldwide. With coverage spanning multiple industries and geographies, our data provides detailed insights into consumer behavior, online trends, and digital engagement patterns.

    Whether you're analyzing traffic flows, identifying audience interests, or tracking competitive performance, our clickstream datasets deliver the scale and granularity needed to inform strategic decisions. Updated regularly to ensure accuracy and relevance, this robust resource empowers businesses to uncover actionable insights and stay ahead in a dynamic digital landscape.

  6. HWID12 (Highway Incidents Detection Dataset)

    • kaggle.com
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landry KEZEBOU (2022). HWID12 (Highway Incidents Detection Dataset) [Dataset]. https://www.kaggle.com/datasets/landrykezebou/hwid12-highway-incidents-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Landry KEZEBOU
    Description

    Context

    Action Recognition in video is known to be more challenging than image recognition problems. Unlike image recognition models which use 2D convolutional neural blocks, action classification models require additional dimensionality to capture the spatio-temporal information in video sequences. This intrinsically makes video action recognition models computationally intensive and significantly more data-hungry than image recognition counterparts. Unequivocally, existing video datasets such as Kinetics, AVA, Charades, Something-Something, HMDB51, and UFC101 have had tremendous impact on the recently evolving video recognition technologies. Artificial Intelligence models trained on these datasets have largely benefited applications such as behavior monitoring in elderly people, video summarization, and content-based retrieval. However, this growing concept of action recognition has yet to be explored in Intelligent Transportation System (ITS), particularly in vital applications such as incidents detection. This is partly due to the lack of availability of annotated dataset adequate for training models suitable for such direct ITS use cases. In this paper, the concept of video action recognition is explored to tackle the problem of highway incident detection and classification from live surveillance footage. First, a novel dataset - HWID12 (Highway Incidents Detection) dataset is introduced. The HWAD12 consists of 11 distinct highway incidents categories, and one additional category for negative samples representing normal traffic. The proposed dataset also includes 2780+ video segments of 3 to 8 seconds on average each, and 500k+ temporal frames. Next, the baseline for highway accident detection and classification is established with a state-of-the-art action recognition model trained on the proposed HWID12 dataset. Performance benchmarking for 12-class (normal traffic vs 11 accident categories), and 2-class (incident vs normal traffic) settings is performed. This benchmarking reveals a recognition accuracy of up to 88% and 98% for 12-class and 2-class recognition setting, respectively.

    Data Acquisition

    The Proposed Highway Incidents Detection Dataset (HWID12) is the first of its kind dataset aimed at fostering experimentation of video action recognition technologies to solve the practical problem of real-time highway incident detections which currently challenges intelligent transportation systems. The lack of such dataset has limited the expansion of the recent breakthroughs in video action classification for practical uses cases in intelligent transportation systems.. The proposed dataset contains more than 2780 video clips of length varying between 3 to 8 seconds. These video clips capture moments leading to, up until right after an incident occurred. The clips were manually segmented from accident compilations videos sourced from YouTube and other videos data platforms.

    Content

    There is one main zip file available for download. The zip file contains 2780+ video clips. 1) 12 folders
    2) each folder represents an incident category. One of the classes represent the negative sample class which simulates normal traffic.

    Terms and Conditions

    • Videos provided in this dataset are freely available for research and education purposes only. Please be sure to properly credit the authors by citing the article below.
    • Be sure to upvote this dataset if you find it useful by scrolling up and clicking the up-Arrow ^ sign at the top banner of the page, next to "New Notebook" button.
    • Be sure to blur out all plate numbers before publishing any of the contents available in this dataset.

    Acknowledgements

    Any publication using this database must reference to the following journal manuscript:

    • Landry Kezebou, Victor Oludare, Karen Panetta, James Intriligator, and Sos Agaian "Highway accident detection and classification from live traffic surveillance cameras: a comprehensive dataset and video action recognition benchmarking", Proc. SPIE 12100, Multimodal Image Exploitation and Learning 2022, 121000M (27 May 2022); https://doi.org/10.1117/12.2618943

    Note: if the link is broken, please use http instead of https.

    In Chrome, use the steps recommended in the following website to view the webpage if it appears to be broken https://www.technipages.com/chrome-enabledisable-not-secure-warning

    Other relevant datasets VCoR dataset: https://www.kaggle.com/landrykezebou/vcor-vehicle-color-recognition-dataset VRiV dataset: https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset

    For any enquires regarding the HWID12 dataset, contact: landrykezebou@gmail.com

  7. S

    Traffic Collision Data from 2010 to Present

    • splitgraph.com
    • data.lacity.org
    • +4more
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lacity (2024). Traffic Collision Data from 2010 to Present [Dataset]. https://www.splitgraph.com/lacity/traffic-collision-data-from-2010-to-present-d5tf-ez2w/
    Explore at:
    json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
    Dataset updated
    Oct 15, 2024
    Authors
    lacity
    Description

    This dataset reflects traffic collision incidents in the City of Los Angeles dating back to 2010. This data is transcribed from original traffic reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. This data is as accurate as the data in the database. Please note questions or concerns in the comments.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  8. S

    Traffic Volume Counts

    • splitgraph.com
    • data.cityofnewyork.us
    • +3more
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cityofnewyork-us (2022). Traffic Volume Counts [Dataset]. https://www.splitgraph.com/cityofnewyork-us/traffic-volume-counts-btm5-ppia/
    Explore at:
    application/vnd.splitgraph.image, json, application/openapi+jsonAvailable download formats
    Dataset updated
    May 27, 2022
    Authors
    cityofnewyork-us
    Description

    New York City Department of Transportation (NYC DOT) uses Automated Traffic Recorders (ATR) to collect traffic sample volume counts at bridge crossings and roadways. These counts do not cover the entire year, and the number of days counted per location may vary from year to year.

    Also see Automated Traffic Volume Counts: https://data.cityofnewyork.us/Transportation/Automated-Traffic-Volume-Counts/7ym2-wayt

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  9. g

    Visiting address for the computer hotel | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visiting address for the computer hotel | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-data-norge-no-node-2147
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Visitor numbers for the data hotel (hotel.difi.no) showing page views per dataset, and for quarter datasets, many page views that are of different formats (JSON, JSONP, XML, complete download, etc.). In addition, an approximate count for traffic (in bytes) per. dataset. The boiler for data is data about page views in AWStats. These tala are queued through a program that sums up traffic per dataset and filters out unrelevant traffic. For explanation of the various fields, including mulege values, see field definitions. OBS. Please note that statistics before 2017 are incorrect. This is a technical problem that causes us to lack traffic data for larger or smaller periods. For example, one lacks of years of data for over 100 days. Ideas for use — Create a web app that shows statistics per data set, graph for page views over time. — Summing up traffic per data settlement There may be errors in the dataset. Use the comments section if you have any questions, comments or other comments!

  10. Facebook users worldwide 2017-2027

    • statista.com
    • de.statista.com
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  11. R

    Analysis of the route safety of abnormal vehicle from the perspective of...

    • repod.icm.edu.pl
    json, tsv, txt
    Updated Feb 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Betkier, Igor (2023). Analysis of the route safety of abnormal vehicle from the perspective of traffic parameters and infrastructure characteristics with the use of web technologies and machine learning [Dataset]. http://doi.org/10.18150/U9NPVL
    Explore at:
    txt(1061), txt(135312), txt(36279), txt(1237), tsv(49700), txt(4657), txt(1274), txt(474), json(223876718), json(142231883), txt(42976), txt(364), json(16510649), json(176705), txt(1316), txt(4420), txt(8577220), json(220646926), json(259936249)Available download formats
    Dataset updated
    Feb 14, 2023
    Dataset provided by
    RepOD
    Authors
    Betkier, Igor
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Narodowe Centrum Nauki
    Description

    Dear Scientist!This database contains data collected due to conducting study: "Analysis of the route safety of abnormal vehicle from the perspective of traffic parameters and infrastructure characteristics with the use of web technologies and machine learning" funded by National Science Centre Poland (Grant reference 2021/05/X/ST8/01669). The structure of files is arising from the aims of the study and numerous of sources needed to tailor suitable data possible to use as an input layer for neural network. You can find a following folders and files:1. Road_Parameters_Data (.csv) - which is data colleced by author before the study (2021). Here you can find information about technical quality and types of main roads located in Mazovia province (Poland). The source of data was Polish General Directorate for National Roads and Motorways. 2. Google_Maps_Data (.json) - here you can find the data, which was collected using the authors’ webservice created using the Python language, which downloaded the said data in the Distance Matrix API service on Google Maps at two-hour intervals from 25 May 2022 to 22 June 2022. The application retrieved the TRAFFIC FACTOR parameter, which was a ratio of actual time of travel divided by historical time of travel for particular roads.3. Geocoding_Roads_Data (.json) - in this folder you can find data gained from reverse geocoding approach based on geographical coordinates and the request parameter latlng were employed. As a result, Google Maps returned a response containing the postal code for the field types defined as postal_code and the name of the lowest possible level of the territorial unit for the field administrative_area_level. 4. Population_Density_Data (.csv) - here you can find date for territorial units, which were assigned to individual records were used to search the database of the Polish Postal Service using the authors' original web service written in the Python programming language. The records which contained a postal code were assigned the name of the municipality which corresponded to it. Finally, postal codes and names of territorial units were compared with the database of the Statistics Poland (GUS) containing information on population density for individual municipalities and assigned to existing records from the database.5. Roads_Incidents_Data (.json) - in this folder you can find a data collected by a webservice, which was programmed in the Python language and used for analysing the reported obstructions available on the website of the General Directorate for National Roads and Motorways. In the event of traffic obstruction emergence in the Mazovia Province, the application, on the basis of the number and kilometre of the road on which it occurred, could associate it later with appropriate records based on the links parameters. The data was colleced from 26 May to 22 June 2022.6. Weather_For_Roads_Data (.json) - here you can find the data concerning the weather conditions on the roads occurring at days of the study. To make this feasible, a webservice was programmed in the Python language, by means of which the selected items from the response returned by the www.timeanddate.com server for the corresponding input parameters were retrieved – geographical coordinates of the midpoint between the nodes of the particular roads. The data was colleced for day between 27 May and 22 June 2022.7. data_v_1 (.csv) - collected only data for road parameters8. data_v_2 (.csv) - collected data for road parameters + population density9. data_v_3 (.json) - collected data for road parameters + population density + traffic10. data_v_4 (.json) - collected data for road parameters + population density + traffic + weather + road incidents11. data_v_5 (.csv) - collected VALIDATED and cleaned data for road parameters + population density + traffic + weather + road incidents. At this stage, the road sections for which the parameter traffic factor was assessed to have been estimated incorrectly were eliminated. These were combinations for which the value of the traffic factor remained the same regardless the time of day or which took several of the same values during the course of the whole study. Moreover, it was also assumed that the final database should consist of road sections for traffic factor less than 1.2 constitute at least 10% of all results. Thus, the sections with no tendency to become congested and characterized by a small number of road traffic users were eliminated.Good luck with your research!Igor Betkier, PhD

  12. S

    Camera Traffic Counts

    • splitgraph.com
    • datahub.austintexas.gov
    • +3more
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datahub-austintexas-gov (2024). Camera Traffic Counts [Dataset]. https://www.splitgraph.com/datahub-austintexas-gov/camera-traffic-counts-sh59-i6y9/
    Explore at:
    application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
    Dataset updated
    Jul 9, 2024
    Authors
    datahub-austintexas-gov
    Description

    Traffic count data collected from the several GRIDSMART optical traffic detectors deployed by the City of Austin.

    This dataset is no longer updated because these devices are no longer maintained

    The Travel Detectors dataset ( https://data.austintexas.gov/Transportation-and-Mobility/Traffic-Detectors/qpuw-8eeb ) is related to this dataset using the 'ATD Device ID' field. The Travel Detectors dataset provides more information on device location and status.

    The average speed measurements may not have been calibrated for all intersections. All measurements have been collected using automated machine vision processes and have not been validated.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  13. h

    syntra-experiment-dataset

    • huggingface.co
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NovelSense UG (2023). syntra-experiment-dataset [Dataset]. http://doi.org/10.57967/hf/1350
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 16, 2023
    Dataset authored and provided by
    NovelSense UG
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About

    This is the SYNTRA Experiment Dataset. It is a sample dataset from the NovelSense SYNTRA EU Hubs 4 Data experiment (https://euhubs4data.eu/experiments/syntra/). The experiment supported the development of a web application reachable under https://syntra.app. The dataset is a synthetic traffic infrastructure dataset e.g. for use for the validation, trainig and optimization of your traffic AI models.

      Datset description
    

    The dataset has been created by generating 14… See the full description on the dataset page: https://huggingface.co/datasets/NovelSense/syntra-experiment-dataset.

  14. Z

    AIT Alert Data Set

    • data.niaid.nih.gov
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landauer, Max; Skopik, Florian; Wurzenberger, Markus (2024). AIT Alert Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8263180
    Explore at:
    Dataset updated
    Oct 14, 2024
    Dataset provided by
    AIT Austrian Institute of Technology
    Authors
    Landauer, Max; Skopik, Florian; Wurzenberger, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].

    The alert data set contains two files for each of the eight scenarios, and a file for their labels:

    _aminer.json contains alerts from AMiner IDS

    _wazuh.json contains alerts from Wazuh IDS and Suricata IDS

    labels.csv contains the start and end times of attack phases in each scenario

    Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:

    Scans (nmap, WPScan, dirb)

    Webshell upload (CVE-2020-24186)

    Password cracking (John the Ripper)

    Privilege escalation

    Remote command execution

    Data exfiltration (DNSteal) and stopped service

    The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.

    Acknowledgements: Partially funded by the European Defence Fund (EDF) projects AInception (101103385) and NEWSROOM (101121403), and the FFG project PRESENT (FO999899544). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them.

    If you use the AIT-ADS, please cite the following publications:

    [1] Landauer, M., Skopik, F., Wurzenberger, M. (2024): Introducing a New Alert Data Set for Multi-Step Attack Analysis. Proceedings of the 17th Cyber Security Experimentation and Test Workshop. [PDF]

    [2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]

  15. Traffic Crash Data

    • data.milwaukee.gov
    csv
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milwaukee Police Department (2025). Traffic Crash Data [Dataset]. https://data.milwaukee.gov/dataset/traffic_crash
    Explore at:
    csv(122571597)Available download formats
    Dataset updated
    Oct 26, 2025
    Dataset authored and provided by
    Milwaukee Police Departmenthttp://city.milwaukee.gov/police
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Update Frequency: Daily

    This data-set includes traffic crash information including case number, accident date and the location.

    • Reportable crash reports can take up to 10 business days to appear after the date of the crash if there are no issues with the report.

    • If you cannot find your crash report after 10 business days, please call the Milwaukee Police Department Open Records Section at (414) 935-7435 for further assistance.

    • Non-reportable crash reports can only be obtained by contacting the Open Records Section and will not show up in a search on this site. A non-reportable crash is any accident that does not:

    1) result in injury or death to any person

    2) damage government-owned non-vehicle property to an apparent extent of $200 or more

    3) result in total damage to property owned by any one person to an apparent extent of $1000 or more.

    • All MV4000 crash reports, completed by MPD officers, will be available from the Wisconsin Department of Transportation (WisDOT) Division of Motor Vehicles (DMV) Accident Records Unit, generally 10 days after the incident.

    Online Request: Request your Crash Report online at WisDOT-DMV website, https://app.wi.gov/crashreports.

    Mail: Wisconsin Department of Transportation Crash Records Unit P.O. Box 7919 Madison, WI 53707-7919

    Phone: (608) 266-8753

    To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.

  16. a

    Maryland Bicycle Level of Traffic Stress (LTS) Web Application

    • dev-maryland.opendata.arcgis.com
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2022). Maryland Bicycle Level of Traffic Stress (LTS) Web Application [Dataset]. https://dev-maryland.opendata.arcgis.com/items/8f01552b8ff745d8902476a7c569f64c
    Explore at:
    Dataset updated
    Mar 17, 2022
    Dataset authored and provided by
    ArcGIS Online for Maryland
    Area covered
    Maryland
    Description

    This interactive web application features both the on-road Maryland Level of Bicycle Stress (LTS) feature layer for all road centerlines in Maryland as well the Road-Separated feature layer of all road-separated bike routes throughout Maryland. An overview of the methodology and attribute data for the Maryland Level of Bicycle Stress (LTS) is provided below. For a detailed full report of the methodology, please view the PDF published by the Maryland Department of Transportation here. The Maryland Department of Transportation is transitioning from using the Bicycle Level of Comfort (BLOC) to using the Level of Traffic Stress (LTS) for measuring the “bikeability” of the roadway network. This transition is in coordination with the implementation of MDOT SHA’s Context Driven Design Guidelines and other national and departmental initiatives. LTS is preferred over BLOC as LTS requires fewer variables to calculate including: Average Annual Daily Traffic, Speed Limits, Presence of Bicycle Facilities, Shoulder, etc. Data LimitationsA principle of data governance MDOT strives to provide the best possible data products. While the initial LTS analysis of Maryland’s bicycle network has many uses, it should be used with a clear understanding of the current limitations the data presents.Assumptions - As noted earlier in this document, some of the metrics used to determine LTS score were estimated. Speed limits for many local roadways were not included in the original data and were assigned based on the functional classification of the roadway. Speed limits are also based on the posted speed limit, not the prevailing operating vehicle speeds which can vary greatly. Such discrepancies between actual and assumed conditions could introduce margins of error in some cases. As data quality improves with future iterations, the LTS scoring accuracy will also improve.Generalizations - MDOT’s LTS methodology follows industry standards but needs to account for varying roadway conditions and data reliability from various sources. The LTS methodology aims to accurately capture Maryland’s bicycle conditions and infrastructure but must consider data maintenance requirements. To limit data maintenance generalizations were made in the methodology so that a score could be assigned. Specifically, factors such as intersections, intersection approaches and bike lane blockages are not included in this initial analysis. LTS scores may be adjusted in the future based on MDOT review, updated industry standards, and additional LTS metrics being included in OMOC such as parking and buffer widths.Timestamped - As the LTS score is derived from a dynamic linear referencing system (LRS), any LTS analysis performed reflects the data available in OMOC. Each analysis must be considered ‘timestamped’ and becoming less reliable with age. As variables within OMOC change, whether through documented roadway construction, bikeway improvements or a speed limit reduction, LTS scores will also change. Fortunately, as this data is updated in the linear referencing system, the data becomes more reliable and LTS scores become more accurate.Presence and type of bicycle facilitySpeed limitNumber of Through Lanes/Traffic VolumeTraditionally, the Level of Traffic Stress (LTS) (scale “1” to “4”) is a measure for assessing the quality of the roadway network for its comfort with various bicycle users. The lower the LTS score, the more inviting the bicycle facility is for more audiences.LTS Methodology (Overview)MDOT’s LTS methodology is based on the metrics established by the Mineta Transportation Institute (MTI) Report 11-19 “Low-Stress Bicycling and Network Connectivity (May 2012) - additional criteria refined by Dr. Peter G. Furth (June 2017) below and Montgomery County's Revised Level of Traffic Stress.Shared-use Path Data Development and Complimentary Road Separated Bike Routes DatasetA complimentary dataset – Road Separated Bike Routes, was completed prior to the roadway dataset and is included in this application. It is also provided to the public via (https://maryland.maps.arcgis.com/home/item.html?id=1e12f2996e76447aba89099f41b14359). This first dataset is an inventory of all shared-use paths open to public, two-way bicycle access which contribute to the bicycle transportation network. Shared-use paths and sidepaths were assigned an LTS score of “0” to indicate minimal interaction with motor vehicle traffic. Many paved loop trails entirely within parks, which had no connection to the adjacent roadway network, were not included but may be included in future iterations. Sidepaths, where a shared-use path runs parallel to an adjacent roadway, are included in this complimentary Road Separated Bike Routes Dataset. Sidepaths do not have as an inviting biking environment as shared-use paths with an independent alignment due to the proximity of motor vehicle traffic in addition to greater likelihood of intersections with more roadways and driveways. Future iterations of the LTS will assign an LTS score of “1” to sidepaths. On-street Bicycle Facility Data DevelopmentThis second dataset includes all on-road bicycle facilities which have a designated roadway space for bicycle travel including bike lanes and protected bike lanes. Marked shared lanes in which bicycle and motor vehicle traffic share travel lanes were not included. Shared lanes, whether sharrows, bike boulevards or signed routes were inventoried but treated as mixed traffic for LTS analysis. The bicycle facilities included in the analysis include:Standard Bike Lanes – A roadway lane designated for bicycle travel at least 5-feet-wide. Bike lanes may be located against the curb or between a parking lane and a motor vehicle travel lane. Buffered bike lanes without vertical separation from motor vehicle traffic are included in this category. Following AASHTO and MDOT SHA design standards, bike lanes are assumed to be at least 5-feet-wide even through some existing bike lanes are less than 5-feet-wide.Protected Bike Lanes – lanes located within the street but are separated from motor vehicle travel lanes by a vertical buffer, whether by a row of parked cars, flex posts or concrete planters.Shoulders – Roadway shoulders are commonly used by bicycle traffic. As such, roadways with shoulders open to bicycle traffic were identified and rated for LTS in relation to adjacent traffic speeds and volumes as well as the shoulder width. Shoulders less than 5-feet-wide, the standard bike lane width, were excluded from analysis and these roadway segments were treated as mixed traffic.The Office of Highway Development at MDOT SHA provided the on-street bicycle facility inventory data for state roadways. The shared-use path inventory and on-street bicycle facility inventory was compiled from local jurisdiction’s open-source download or shared form the GIS/IT departments. Before integrating into OMOC, these datasets were verified by conducting desktop surveys and site visits, and by consulting with local officials and residents.-----------------------------------------------------------------------------------------------------------Inquiries? Contact Us!For Methodology: Contact Nate Evans (nevans1@mdot.maryland.gov)For GIS \ Data: Contact Andrew Bernish (abernish@mdot.maryland.gov)

  17. d

    CTDOT Office of the State Traffic Administration (OSTA) Interactive Map

    • catalog.data.gov
    • data.ct.gov
    • +1more
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Connecticut Department of Transportation (2025). CTDOT Office of the State Traffic Administration (OSTA) Interactive Map [Dataset]. https://catalog.data.gov/dataset/ctdot-office-of-the-state-traffic-administration-osta-interactive-map
    Explore at:
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Connecticut Department of Transportation
    Description

    Web Mapping Application containing all layers from the Office of the State Traffic Administration that enables the user to visually review, search, and query OSTA data, including; State, Local, and School Zone Speed Limits, Major Traffic Generators at defined points within the approval process, No Thru Truck designations, and Parkway Restrictions on height, width, and weight.

  18. S

    Police Data: Traffic Citations

    • splitgraph.com
    • data.somervillema.gov
    • +2more
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    somervillema-gov (2024). Police Data: Traffic Citations [Dataset]. https://www.splitgraph.com/somervillema-gov/police-data-traffic-citations-3mqx-eye9
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Oct 15, 2024
    Authors
    somervillema-gov
    Description

    This complete version of the dataset contains traffic citations issued in Somerville by Somerville police officers since 2017. Citations include both written warnings and those with a monetary fine. Every citation is composed of one or more violations. Each row in the dataset represents a violation.

    This data set should be refreshed daily with data appearing with a one-month delay (e.g. citations issued on 1/1 will appear on 2/1). If a daily update does not refresh, please email data@somervillema.gov.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  19. Annual Average Daily Traffic (AADT): Beginning 1977

    • splitgraph.com
    • datasets.ai
    • +4more
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Transportation (2024). Annual Average Daily Traffic (AADT): Beginning 1977 [Dataset]. https://www.splitgraph.com/ny-gov/annual-average-daily-traffic-aadt-beginning-1977-6amx-2pbv/
    Explore at:
    application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
    Dataset updated
    Aug 29, 2024
    Dataset authored and provided by
    New York State Department of Transportationhttp://www.dot.ny.gov/
    Description

    Annual Average Daily Traffic (AADT) is an estimate of the average daily traffic along a defined segment of roadway. This value is calculated from short term counts taken along the same section which are then factored to produce the estimate of AADT. Because of this process, the most recent AADT for any given roadway will always be for the previous year. Data is available for all New York State Routes and roads that are part of the Federal Aid System.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  20. S

    Crash Data

    • splitgraph.com
    • policedata.coloradosprings.gov
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    policedata-coloradosprings-gov (2024). Crash Data [Dataset]. https://www.splitgraph.com/policedata-coloradosprings-gov/crash-data-bjpt-tkzq/
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Oct 15, 2024
    Authors
    policedata-coloradosprings-gov
    Description

    This dataset contains all traffic crashes reported to CSPD . This dataset may be tied to the Tickets and Citations dataset by ticket number.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Loyola University Chicago
Authors
Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Search
Clear search
Close search
Google apps
Main menu