24 datasets found
  1. Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. http://doi.org/10.5281/zenodo.3746129
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

    Contents

    The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

    • Day 1, March 19, 2019
      • Start: 2019-03-19T11:00:00.000000+01:00
      • End: 2019-03-19T18:00:00.000000+01:00
    • Day 2, March 20, 2019
      • Start: 2019-03-20T08:00:00.000000+01:00
      • End: 2019-03-20T15:30:00.000000+01:00

    The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

    • cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.
    • cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.
    • cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

    Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

    • data.json.gz – the actual data entries in a single gzipped JSON file.
    • dictionary.yml – data dictionary for the entries.
    • schema.ddl – data schema for Apache Spark analytics engine.
    • schema.jsch – JSON schema for the entries.

    Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

    • global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.
    • global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.
    • redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.
    • redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.
    • topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.
    • topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.

  2. z

    Global Dataset of Cyber Incidents

    • zenodo.org
    bin, csv, pdf, txt
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley (2025). Global Dataset of Cyber Incidents [Dataset]. http://doi.org/10.5281/zenodo.14965395
    Explore at:
    pdf, bin, txt, csvAvailable download formats
    Dataset updated
    Mar 4, 2025
    Dataset provided by
    European Repository of Cyber Incidents
    Authors
    Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The European Repository of Cyber Incidents (EuRepoC) is releasing the Global Dataset of Cyber Incidents in Version 1.3 as an extract of our backend database. This official release contains fully consolidated cyber incident data reviewed by our interdisciplinary experts in the fields of politics, law and technology across all 60 variables covered by the European Repository. Version 1.3 covers the years 2000 – 2024 entirely. The Global Dataset is meant for reliable, evidence-based analysis. If you require real-time data, please refer to the download option in our TableView or contact us for special requirements (including API access).

    The dataset now contains data on 3416 cyber incidents which started between 01.01.2000 and 31.12.2024. The European Repository of Cyber Incidents (EuRepoC) gathers, codes, and analyses publicly available information from over 220 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

    For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

    Full Codebook available here

    Information about each file

    please scroll down this page entirely to see all files available. Zenodo only displays the attribution dataset by default.

    Global Database (csv or xlsx):
    This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

    Receiver Dataset (csv or xlsx):
    In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

    Attribution Dataset (csv or xlsx):
    This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

    Dyadic Dataset (csv or xlsx):
    The dyadic dataset puts state dyads in the focus. Each row in the dataset represents one cyber incident in a specific dyad. Because incidents may affect multiple receivers, single incidents can be duplicated in this format, when they affected multiple countries.

  3. w

    Database for Cyber Security Agencies

    • whoisdatacenter.com
    csv
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Database for Cyber Security Agencies [Dataset]. https://whoisdatacenter.com/whois-database-for-cyber-security-companies/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Mar 27, 2025
    Description

    Strengthen your cyber defense with our extensive, daily-updated WHOIS database. Accessible in CSV, JSON, and XML, it's a crucial asset for any security strategy.

  4. S

    Survey data from a survey about cybersecurity training and usability of...

    • snd.se
    doc
    Updated Jun 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Kävrestad (2021). Survey data from a survey about cybersecurity training and usability of security functions [Dataset]. http://doi.org/10.5878/pv4m-s237
    Explore at:
    doc(34816)Available download formats
    Dataset updated
    Jun 29, 2021
    Dataset provided by
    University of Skövde
    Swedish National Data Service
    Authors
    Joakim Kävrestad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Europe, Sweden, Northern Europe, Southern Europe, Italy, United Kingdom
    Dataset funded by
    Vinnova
    Description

    This data set was acquired using a survey which intends to measure: • Participants previous experience of cybersecurity training • Participants perception of ideal cybersecurity training • Participants perception of a specific cybersecurity training type called ContextBased MicroTraining • What usability aspects the participants find most important for security features Data was acquired from Sweden, UK and Italy to allow for comparative analysis. Demographic data was collected to allow for further analysis based on those. The files included in this data set are: • Completesurvey: This document includes the full survey presented to the participants. • Dataset: This file contains the variables and data for the different questions (available as .sav (SPSS and .csv)). • Var_info: contains information about the variables in the dataset • Overview: Contains frequency tables for the survey question (for the complete data set) • Sweden, UK, and Italy: Contains frequency tables for the survey questions divided by national sample groups.

    Se attahed description

  5. P

    Kitsune Network Attack Dataset Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai (2023). Kitsune Network Attack Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kitsune-network-attack-dataset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Authors
    Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai
    Description

    Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.

    For each attack, you are supplied with:

    A preprocessed dataset in csv format (ready for machine learning) The corresponding label vector in csv format The original network capture in pcap format (in case you want to engineer your own features)

    We will now describe in detail what's in these datasets and how they were collected.

    The Network Attacks We have collected a wide variety of attacks which you would find in a real network intrusion. The following is a list of the cyber attack datasets avalaible:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F827271%2F79e305668553e521b0709a2413323c45%2Fkaggle_dataset_table.png?generation=1598461684070844&alt=media" alt="image" width="100">

    For more details on the attacks themselves, please refer to our NDSS paper (citation below).

    The Data Collection The following figure presents the network topologies which we used to collect the data, and the corrisponding attack vectors at which the attacks were performed. The network capture took place at point 1 and point X at the router (where a network intrusion detection system could feasibly be placed). For each dataset, clean network traffic was captured for the first 1 million packets, then the cyber attack was performed.

    The Dataset Format Each preprocessed dataset csv has m rows (packets) and 115 columns (features) with no header. The 115 features were extracted using our AfterImage feature extractor, described in our NDSS paper (see below) and available in Python here. In summary, the 115 features provide a statistical snapshot of the network (hosts and behaviors) in the context of the current packet traversing the network. The AfterImage feature extractor is unique in that it can efficiently process millions of streams (network channels) in real-time, incrementally, making it suitable for handling network traffic.

    Citation If you use these datasets, please cite:

    @inproceedings{mirsky2018kitsune, title={Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection}, author={Mirsky, Yisroel and Doitshman, Tomer and Elovici, Yuval and Shabtai, Asaf}, booktitle={The Network and Distributed System Security Symposium (NDSS) 2018}, year={2018} }

  6. Dataset: What Are Cybersecurity Education Papers About? A Systematic...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valdemar Švábenský; Valdemar Švábenský; Jan Vykopal; Jan Vykopal; Pavel Čeleda; Pavel Čeleda (2023). Dataset: What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences [Dataset]. http://doi.org/10.5281/zenodo.3506640
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Valdemar Švábenský; Valdemar Švábenský; Jan Vykopal; Jan Vykopal; Pavel Čeleda; Pavel Čeleda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials for the following conference paper:

    Valdemar Švábenský, Jan Vykopal, Pavel Čeleda.
    What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences.
    In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020).
    https://doi.org/10.1145/3328778.3366816

    Preprint available at: https://arxiv.org/abs/1911.11675

    How to cite

    If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

    @inproceedings{Svabensky2020what,
      author  = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and \v{C}eleda, Pavel},
      title   = {{What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences}},
      booktitle = {Proceedings of the 51st ACM Technical Symposium on Computer Science Education},
      series  = {SIGCSE '20},
      location = {Portland, OR, USA},
      publisher = {Association for Computing Machinery},
      address  = {New York, NY, USA},
      month   = {03},
      year   = {2020},
      pages   = {2--8},
      numpages = {7},
      isbn   = {978-1-4503-6793-6},
      url    = {https://doi.org/10.1145/3328778.3366816},
      doi    = {10.1145/3328778.3366816},
    }

    Attached content

    The file "SIGCSE 2020 Literature Review.xlsx" is an Excel spreadsheet with three sheets corresponding to 1) all papers found by automated search, 2) manually excluded papers, and 3) papers included in the literature review. There are also three CSV files that correspond to the three individual sheets.

  7. z

    Global Dataset of Cyber Incidents V.1.2

    • zenodo.org
    bin, csv, json
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC) (2024). Global Dataset of Cyber Incidents V.1.2 [Dataset]. http://doi.org/10.5281/zenodo.11108195
    Explore at:
    csv, bin, jsonAvailable download formats
    Dataset updated
    May 3, 2024
    Dataset provided by
    European Repository of Cyber Incidents (EuRepoC)
    Authors
    European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 2, 2024
    Description

    The dataset contains data on 2889 cyber incidents between 01.01.2000 and 02.05.2024 using 60 variables, including the start date, names and categories of receivers along with names and categories of initiators. The database was compiled as part of the European Repository of Cyber Incidents (EuRepoC) project.


    EuRepoC gathers, codes, and analyses publicly available information from over 200 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

    For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

    Codebook available here

    Information about each file:

    Global Database (csv or xlsx):
    This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

    Receiver Dataset (csv):
    In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

    Attribution Dataset (csv):
    This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

    eurepoc_global_database_1.2 (json):
    This file contains the whole database in JSON format.

  8. d

    Global Domain Name Data | DNS and Risk Classification via Dataset & API |...

    • datarade.ai
    .csv, .json
    Updated Nov 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datazag (2024). Global Domain Name Data | DNS and Risk Classification via Dataset & API | 267M+ Domains Covering Over 1570 Domain Zones | Updated Daily [Dataset]. https://datarade.ai/data-products/datazag-global-domain-name-data-dns-and-risk-classificatio-datazag
    Explore at:
    .csv, .jsonAvailable download formats
    Dataset updated
    Nov 2, 2024
    Dataset authored and provided by
    Datazag
    Area covered
    Bahamas, Lesotho, Marshall Islands, Dominica, Kenya, State of, Norway, Niue, Gambia, Paraguay
    Description

    DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

    The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

    DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

    Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.

  9. m

    Data from: Hornet 40: Network Dataset of Geographically Placed Honeypots

    • data.mendeley.com
    • data.niaid.nih.gov
    • +1more
    Updated Jun 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Valeros (2021). Hornet 40: Network Dataset of Geographically Placed Honeypots [Dataset]. http://doi.org/10.17632/tcfzkbpw46.1
    Explore at:
    Dataset updated
    Jun 3, 2021
    Authors
    Veronica Valeros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hornet 40 is a dataset of 40 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots are located in eight different cities: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore. The data was captured in April, May, and June 2021.

    The eight cloud servers were created and configured simultaneously following identical instructions. The network capture was performed using the Argus network monitoring tool in each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated as a honeypot. No honeypot software was used in this dataset.

    The dataset consists of eight scenarios, one for each geographically located cloud server. Each scenario contains bidirectional NetFlow files in the following format: - hornet40-biargus.tar.gz: all scenarios with bidirectional NetFlow files in Argus binary format; - hornet40-netflow-v5.tar.gz: all scenarios with bidirectional NetFlow v5 files in CSV format; - hornet40-netflow-extended.tar.gz: all scenarios with bidirectional NetFlows files in CSV format containing all features provided by Argus. - hornet40-full.tar.gz: download all the data (biargus, NetFlow v5, and extended NetFlows)

  10. m

    Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning...

    • data.mendeley.com
    Updated Dec 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihao Wang (2022). Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning based Encrypted Traffic Analysis [Dataset]. http://doi.org/10.17632/xw7r4tt54g.1
    Explore at:
    Dataset updated
    Dec 6, 2022
    Authors
    Zihao Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets.

    Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic.

    Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset.

    The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.

  11. MAD (MAlicious Traffic Dataset) in home and commercial environments - Home...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques (2021). MAD (MAlicious Traffic Dataset) in home and commercial environments - Home environment [Dataset]. http://doi.org/10.5281/zenodo.5094055
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For the home environment we have: 01 Wifi Modem Router, 03 Smartphones, 01 server, 01 desktop, 01 Multifunction Printer, 01 network extender, 01 SmartTV, 01 Cable TV decoder and 01 firewall. This environment is a local network. The server has the Monitoring Environment and a network card, which provides connectivity and receives all network traffic for analysis.

    The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.

    events.csv.gz - Suricata / Evebox collections

    net.csv.gz - Telegraf collections from the TICK stack

    netstat.csv.gz - Telegraf collections from the TICK stack

    For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.

    The interval between collections, non-consecutive, was from 2018-09-15 to 2019-02-04

  12. Malware Repositories and Their Authors on GitHub

    • zenodo.org
    • data.niaid.nih.gov
    csv, txt
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nishat Ara Tania; Nishat Ara Tania; Md Rayhanul Masud; Md Rayhanul Masud; Md Omar Faruk Rokon; Md Omar Faruk Rokon; Qian Zhang; Qian Zhang; Michalis Faloutsos; Michalis Faloutsos (2024). Malware Repositories and Their Authors on GitHub [Dataset]. http://doi.org/10.5281/zenodo.10806593
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Mar 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nishat Ara Tania; Nishat Ara Tania; Md Rayhanul Masud; Md Rayhanul Masud; Md Omar Faruk Rokon; Md Omar Faruk Rokon; Qian Zhang; Qian Zhang; Michalis Faloutsos; Michalis Faloutsos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 4, 2024
    Description

    This dataset is rooted in a study aimed at unveiling the origins and motivations behind the creation of malware repositories on GitHub. Our research embarks on an innovative journey to dissect the profiles and intentions of GitHub users who have been involved in this dubious activity.

    Employing a robust methodology, we meticulously identified 14,000 GitHub users linked to malware repositories. By leveraging advanced large language model (LLM) analytics, we classified these individuals into distinct categories based on their perceived intent: 3,339 were deemed Malicious, 3,354 Likely Malicious, and 7,574 Benign, offering a nuanced perspective on the community behind these repositories.

    Our analysis penetrates the veil of anonymity and obscurity often associated with these GitHub profiles, revealing stark contrasts in their characteristics. Malicious authors were found to typically possess sparse profiles focused on nefarious activities, while Benign authors presented well-rounded profiles, actively contributing to cybersecurity education and research. Those labeled as Likely Malicious exhibited a spectrum of engagement levels, underlining the complexity and diversity within this digital ecosystem.

    We are offering two datasets in this paper. First, a list of malware repositories - we have collected and extended the malware repositories on the GitHub in 2022 following the original papers. Second, a csv file with the github users information with their maliciousness classfication label.

    1. malware_repos.txt

      • Purpose: This file contains a curated list of GitHub repositories identified as containing malware. These repositories were identified following the methodology outlined in the research paper "SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub."
      • Contents: The file is structured as a simple text file, with each line representing a unique repository in the format username/reponame. This format allows for easy identification and access to each repository on GitHub for further analysis or review.
      • Usage: The list serves as a critical resource for researchers and cybersecurity professionals interested in studying malware, understanding its distribution on platforms like GitHub, or developing defense mechanisms against such malicious content.
    2. obfuscated_github_user_dataset.csv

      • Purpose: Accompanying the list of malware repositories, this CSV file contains detailed, albeit obfuscated, profile information of the GitHub users who authored these repositories. The obfuscation process has been applied to protect user privacy and comply with ethical standards, especially given the sensitive nature of associating individuals with potentially malicious activities.
      • Contents: The dataset includes several columns representing different aspects of user profiles, such as obfuscated identifiers (e.g., ID, login, name), contact information (e.g., email, blog), and GitHub-specific metrics (e.g., followers count, number of public repositories). Notably, sensitive information has been masked or replaced with generic placeholders to prevent user identification.
      • Usage: This dataset can be instrumental for researchers analyzing behaviors, patterns, or characteristics of users involved in creating malware repositories on GitHub. It provides a basis for statistical analysis, trend identification, or the development of predictive models, all while upholding the necessary ethical considerations.
  13. DNP3 Intrusion Detection Dataset

    • zenodo.org
    • ieee-dataport.org
    • +1more
    bin, pdf
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panagiotis; Panagiotis; Vasiliki; Vasiliki; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis (2024). DNP3 Intrusion Detection Dataset [Dataset]. http://doi.org/10.21227/s7h0-b081
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Panagiotis; Panagiotis; Vasiliki; Vasiliki; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1.Introduction

    In the digital era of the Industrial Internet of Things (IIoT), the conventional Critical Infrastructures (CIs) are transformed into smart environments with multiple benefits, such as pervasive control, self-monitoring and self-healing. However, this evolution is characterised by several cyberthreats due to the necessary presence of insecure technologies. DNP3 is an industrial communication protocol which is widely adopted in the CIs of the US. In particular, DNP3 allows the remote communication between Industrial Control Systems (ICS) and Supervisory Control and Data Acquisition (SCADA). It can support various topologies, such as Master-Slave, Multi-Drop, Hierarchical and Multiple-Server. Initially, the architectural model of DNP3 consists of three layers: (a) Application Layer, (b) Transport Layer and (c) Data Link Layer. However, DNP3 can be now incorporated into the Transmission Control Protocol/Internet Protocol (TCP/IP) stack as an application-layer protocol. However, similarly to other industrial protocols (e.g., Modbus and IEC 60870-5-104), DNP3 is characterised by severe security issues since it does not include any authentication or authorisation mechanisms. More information about the DNP3 security issue is provided in [1-3]. This dataset contains labelled Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics (Common-Separated Values - CSV format) and DNP3 flow statistics (CSV format) related to 9 DNP3 cyberattacks. These cyberattacks are focused on DNP3 unauthorised commands and Denial of Service (DoS). The network traffic data are provided through Packet Capture (PCAP) files. Consequently, this dataset can be used to implement Artificial Intelligence (AI)-powered Intrusion Detection and Prevention (IDPS) systems that rely on Machine Learning (ML) and Deep Learning (DL) techniques.

    2.Instructions

    This DNP3 Intrusion Detection Dataset was implemented following the methodological frameworks of A. Gharib et al. in [4] and S. Dadkhah et al in [5], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.

    A network topology consisting of (a) eight industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to implement this DNP3 Intrusion Detection Dataset. In particular, the following cyberattacks were implemented.

    • On Thursday, May 14, 2020, the DNP3 Disable Unsolicited Messages Attack was executed for 4 hours.
    • On Friday, May 15, 2020, the DNP3 Cold Restart Message Attack was executed for 4 hours.
    • On Friday, May 15, 2020, the DNP3 Warm Restart Message Attack was executed for 4 hours.
    • On Saturday, May 16, 2020, the DNP3 Enumerate Attack was executed for 4 hours.
    • On Saturday, May 16, 2020, the DNP3 Info Attack was executed for 4 hours.
    • On Monday, May 18, 2020, the DNP3 Initialisation Attack was executed for 4 hours.
    • On Monday, May 18, 2020, the Man In The Middle (MITM)-DoS Attack was executed for 4 hours.
    • On Monday, May 18, 2020, the DNP3 Replay Attack was executed for 4 hours.
    • On Tuesday, May 19, 2020, the DNP3 Stop Application Attack was executed for 4 hours.

    The aforementioned DNP3 cyberattacks were executed, utilising penetration testing tools, such as Nmap and Scapy. For each attack, a relevant folder is provided, including the network traffic and the network flow statistics for each entity. In particular, for each cyberattack, a folder is given, providing (a) the pcap files for each entity, (b) the Transmission Control Protocol (TCP)/ Internet Protocol (IP) network flow statistics for 120 seconds in a CSV format and (c) the DNP3 flow statistics for each entity (using different timeout values in terms of second (such as 45, 60, 75, 90, 120 and 240 seconds)). The TCP/IP network flow statistics were produced by using the CICFlowMeter, while the DNP3 flow statistics were generated based on a Custom DNP3 Python Parser, taking full advantage of Scapy.

    3. Dataset Structure

    The dataset consists of the following folders:

    • 20200514_DNP3_Disable_Unsolicited_Messages_Attack: It includes the pcap and CSV files related to the DNP3 Disable Unsolicited Message attack.
    • 20200515_DNP3_Cold_Restart_Attack: It includes the pcap and CSV files related to the DNP3 Cold Restart attack.
    • 20200515_DNP3_Warm_Restart_Attack: It includes the pcap and CSV files related to DNP3 Warm Restart attack.
    • 20200516_DNP3_Enumerate: It includes the pcap and CSV files related to the DNP3 Enumerate attack.
    • 20200516_DNP3_Ιnfo: It includes the pcap and CSV files related to the DNP3 Info attack.
    • 20200518_DNP3_Initialize_Data_Attack: It includes the pcap and CSV files related to the DNP3 Data Initialisation attack.
    • 20200518_DNP3_MITM_DoS: It includes the pcap and CSV files related to the DNP3 MITM-DoS attack.
    • 20200518_DNP3_Replay_Attack: It includes the pcap and CSV files related to the DNP3 replay attack.
    • 20200519_DNP3_Stop_Application_Attack: It includes the pcap and CSV files related to the DNP3 Stop Application attack.
    • Training_Testing_Balanced_CSV_Files: It includes balanced CSV files from CICFlowMeter and the Custom DNP3 Python Parser that could be utilised for training ML and DL methods. Each folder includes different sub-folder for the corresponding flow timeout values used by the DNP3 Python Custom Parser. For CICFlowMeter, only the timeout value of 120 seconds was used.

    Each folder includes respective subfolders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the DNP3 network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) generated by CICFlowMeter for the timeout value of 120 seconds and finally (c) the DNP3 flow statistics (CSV file) from the Custom DNP3 Python Parser. Finally, it is noteworthy that the network flows from both CICFlowMeter and Custom DNP3 Python Parser in each CSV file are labelled based on the DNP3 cyberattacks executed for the generation of this dataset. The description of these attacks is provided in the following section, while the various features from CICFlowMeter and Custom DNP3 Python Parser are presented in Section 5.

    4.Testbed & DNP3 Attacks

    The following figure shows the testbed utilised for the generation of this dataset. It is composed of eight industrial entities that play the role of the DNP3 outstations/slaves, such as Remote Terminal Units (RTUs) and Intelligent Electron Devices (IEDs). Moreover, there is another workstation which plays the role of the Master station like a Master Terminal Unit (MTU). For the communication between, the DNP3 outstations/slaves and the master station, opendnp3 was used.

    Table 1: DNP3 Attacks Description

    DNP3 Attack

    Description

    Dataset Folder

    DNP3 Disable Unsolicited Message Attack

    This attack targets a DNP3 outstation/slave, establishing a connection with it, while acting as a master station. The false master then transmits a packet with the DNP3 Function Code 21, which requests to disable all the unsolicited messages on the target.

    20200514_DNP3_Disable_Unsolicited_Messages_Attack

    DNP3 Cold Restart Attack

    The malicious entity acts as a master station and sends a DNP3 packet that includes the “Cold Restart” function code. When the target receives this message, it initiates a complete restart and sends back a reply with the time window before the restart process.

    20200515_DNP3_Cold_Restart_Attack

    DNP3 Warm Restart Attack

    This attack is quite similar to the “Cold Restart Message”, but aims to trigger a partial restart, re-initiating a DNP3 service on the target outstation.

    20200515_DNP3_Warm_Restart_Attack

    DNP3 Enumerate Attack

    This reconnaissance attack aims to discover which DNP3 services and functional codes are used by the target system.

    20200516_DNP3_Enumerate

    DNP3 Info Attack

    This attack constitutes another reconnaissance attempt, aggregating various DNP3 diagnostic information related the DNP3 usage.

    20200516_DNP3_Ιnfo

    Data Initialisation Attack

    This cyberattack is related to Function Code 15 (Initialize Data). It is an unauthorised access attack, which demands from the slave to re-initialise possible configurations to their initial values, thus changing potential values defined by legitimate masters

    20200518_Initialize_Data_Attack

    MITM-DoS Attack

    In

  14. Network traffic datasets created by Single Flow Time Series Analysis

    • zenodo.org
    • explore.openaire.eu
    csv, pdf
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File nameDetection problemCitation of original raw dataset
    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    cryptomining_design.csvBinary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    edge_iiot_multiclass.csvMulti-class classification of IoT malwareMohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    https_brute_force.csvBinary detection of HTTPS Brute ForceJan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
    ids_cic_binary.csvBinary detection of intrusion in IDSIman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
    vpn_vnat_multiclass.csvMulti-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  15. Cyberbullying Dataset

    • kaggle.com
    Updated Oct 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2022). Cyberbullying Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/cyberbullying-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    This dataset is a collection of datasets from different sources related to the automatic detection of cyber-bullying. The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

    Content

    The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

    Acknowledgements

    Elsafoury, Fatma (2020), “Cyberbullying datasets”, Mendeley Data, V1, doi: 10.17632/jf4pzyvnpj.1

  16. m

    RT Spoofing Attacks on MIL-STD-1553 Communication Traffic

    • data.mendeley.com
    Updated Dec 2, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ran Yahalom (2018). RT Spoofing Attacks on MIL-STD-1553 Communication Traffic [Dataset]. http://doi.org/10.17632/jvgdrmjvs3.2
    Explore at:
    Dataset updated
    Dec 2, 2018
    Authors
    Ran Yahalom
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MIL-STD-1553 is a military standard that defines the protocol characteristics of a data bus medium for the exchange of information between various subsystems. Although the threat of cyber-attacks on the MIL-STD-1553 protocol has become a growing concern in recent years, little work has been published on detecting such attacks. One of the primary reasons for this is the confidentiality of data recorded from buses on operational systems and as a result, lack of data availability. Moreover, existing research doesn’t sufficiently emphasize the complexity of detecting attacks that can be camouflaged by normal non-periodic messages that the MIL-STD-1553 supports.

    We present three datasets of synthesized MIL-STD-1553 traffic containing injected RT Spoofing Attack messages. The implemented attacks emulate normal non-periodical communication so detecting them with a low false positive rate is non-trivial. Each dataset is separated into a training set of normal messages and a test set of both normal and attack messages. The test sets differ by the occurrence rate of attack messages (0.01%, 0.1%, and 1%). Each dataset is also preprocessed into a dataset of message sequences so that it can be used for sequential anomaly detection analysis. The sequential test sets differ by the occurrence rate of attack sequences (0.14%, 1.26%, and 11.01%). A Java program for generating the sequence datasets from the message stream datasets is also included so users can generate new sequence datasets with different sequence lengths or a labeling according to whether or not the message was injected instead of whether or not it affected the aircraft's behavior.

    These datasets are intended to serve three primary purposes: (1) evaluate the ability of MIL-STD-1553 intrusion detection systems (IDS) to detect attacks that emulate normal non-periodical traffic; (2) evaluate IDSs on differing occurrence rates of attacks; (3) evaluate and compare IDSs that operate on non-sequential data as well as IDSs that operate on sequential data.

    Please refer to the linked data description document for the full details of the data synthesis process, the motivation for our preprocessing into sequences, and the format of the CSV files. This document also provides relevant background on detecting Spoofing Attacks from MIL-STD-1553 Traffic.

  17. Exploits: All registered in exploit-db, from January 2019 to October 2020

    • zenodo.org
    csv
    Updated Nov 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon (2020). Exploits: All registered in exploit-db, from January 2019 to October 2020 [Dataset]. http://doi.org/10.5281/zenodo.4259954
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 8, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset contains all exploits registered on the exploit-db website, from 02 January 2019 to 06 November 2020. 2,665 exploits were found in this time range, and stored in CSV file. The CSV fields are as follows:

    • Id: exploit-db identifier.
    • Link Exploit: Link to download exploit code source.
    • App: Link to download the application in case the exploit is an application, and if it isn't an app the field is empty.
    • Date: Publication date on the website.
    • Verification: Exploit verfication by website.
    • Title: Exploit title.
    • Type: Exploit type. Could be Local, Remote and Webapp.
    • Platform: Platform on which the exploit is based on.
    • Author: Name of exploit author.
  18. d

    Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

    • datarade.ai
    .json, .csv, .xls
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    Altosight
    Area covered
    Guatemala, Singapore, Côte d'Ivoire, Greenland, Czech Republic, Tajikistan, Paraguay, Chile, Wallis and Futuna, Svalbard and Jan Mayen
    Description

    Altosight | AI Custom Web Scraping Data

    ✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

    We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

    ✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

    We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

    ― Key Use Cases ―

    ➤ Price Monitoring & Repricing Solutions

    🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

    ➤ E-commerce Optimization

    🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

    ➤ Product Assortment Analysis

    🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

    ➤ Marketplaces & Aggregators

    🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

    ➤ Business Website Data

    🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

    🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

    ➤ Domain Name Data

    🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

    ➤ Real Estate Data

    🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

    ― Data Collection & Quality ―

    ► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

    ► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

    ► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

    ► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

    ► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

    ► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

    ― Why Choose Altosight? ―

    ✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

    ✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

    ✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

    ✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

    ✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

    ✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

    ✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

    ― Custom Projects & Real-Time Data ―

    ✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

    ✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...

  19. 45 Vulnerability Discoverability Timelines from the 2019 Collegiate...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Apr 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin S. Meyers; Benjamin S. Meyers; Andrew Meneely; Andrew Meneely (2022). 45 Vulnerability Discoverability Timelines from the 2019 Collegiate Penetration Testing Competition [Dataset]. http://doi.org/10.5281/zenodo.5781239
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 22, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin S. Meyers; Benjamin S. Meyers; Andrew Meneely; Andrew Meneely
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This is a collection of manually curated timelines from the 2019 Collegiate Penetration Testing Competition (CPTC). Collection and annotation are described in detail in this publication:

    • Benjamin S. Meyers, Sultan Fahad Almassari, Brandon N. Keller, and Andrew Meneely. Examining Penetration Tester Behavior in the Collegiate Penetration Testing Competition. Forthcoming at Transactions on Software Engineering and Methodology. https://dl.acm.org/doi/10.1145/3514040

    Included Files

    • 2019_cptc_timelines.csv: Completed timelines for ten teams from the 2019 CPTC nationals competition.
    • 2019_cptc_timeline_columns.csv: Descriptions of the columns in 2019_cptc_timelines.csv.
    • 2019_cptc_vulnerabilities.csv: Brief vulnerability descriptions and CWE mappings.

    Other Resources

    • Complete Splunk log data dumps are available here. These must be ingested and viewed with a Splunk instance.
    • To request access to the CPTC team reports, please contact Brock Wagehoft (email).

    Contact

    Please contact Benjamin S. Meyers (email) with questions about this data and its collection.

    Acknowledgments

    Collection of this data has been sponsored in part by the National Science Foundation grant 1922169, and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).

  20. MAD (MAlicious Traffic Dataset) in home and commercial environments -...

    • zenodo.org
    application/gzip
    Updated Jul 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques (2021). MAD (MAlicious Traffic Dataset) in home and commercial environments - Internet environment [Dataset]. http://doi.org/10.5281/zenodo.5111959
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have for the Internet environment: 01 Switch, 01 IP camera, 01 server for monitoring, 01 server for honeypot and no firewall. This environment is directly connected to the Internet. We installed a server, functioning as a Monitoring Environment. The network traffic was obtained via Port Mirroring on the switch to the Monitoring Environment server.

    The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.

    events.csv.gz - Suricata / Evebox collections

    net.csv.gz - Telegraf collections from the TICK stack

    netstat.csv.gz - Telegraf collections from the TICK stack

    For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.

    The interval between collections, non-consecutive, was from 2018-08-28 to 2019-11-14

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. http://doi.org/10.5281/zenodo.3746129
Organization logo

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

Related Article
Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Jun 12, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

Contents

The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

  • Day 1, March 19, 2019
    • Start: 2019-03-19T11:00:00.000000+01:00
    • End: 2019-03-19T18:00:00.000000+01:00
  • Day 2, March 20, 2019
    • Start: 2019-03-20T08:00:00.000000+01:00
    • End: 2019-03-20T15:30:00.000000+01:00

The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

  • cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.
  • cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.
  • cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

  • data.json.gz – the actual data entries in a single gzipped JSON file.
  • dictionary.yml – data dictionary for the entries.
  • schema.ddl – data schema for Apache Spark analytics engine.
  • schema.jsch – JSON schema for the entries.

Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

  • global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.
  • global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.
  • redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.
  • redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.
  • topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.
  • topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.

Search
Clear search
Close search
Google apps
Main menu