24 datasets found

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jun 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. http://doi.org/10.5281/zenodo.3746129
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3746129
Dataset updated
Jun 12, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

Contents

The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

Day 1, March 19, 2019

Start: 2019-03-19T11:00:00.000000+01:00

End: 2019-03-19T18:00:00.000000+01:00

Day 2, March 20, 2019

Start: 2019-03-20T08:00:00.000000+01:00

End: 2019-03-20T15:30:00.000000+01:00

The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.

cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.

cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

data.json.gz – the actual data entries in a single gzipped JSON file.

dictionary.yml – data dictionary for the entries.

schema.ddl – data schema for Apache Spark analytics engine.

schema.jsch – JSON schema for the entries.

Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.

global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.

redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.

redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.

topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.

topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.
z
Global Dataset of Cyber Incidents
zenodo.org
bin, csv, pdf, txt
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley (2025). Global Dataset of Cyber Incidents [Dataset]. http://doi.org/10.5281/zenodo.14965395
Explore at:
pdf, bin, txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14965395
Dataset updated
Mar 4, 2025
Dataset provided by
European Repository of Cyber Incidents
Authors
Kerstin Zettl-Schabath; Kerstin Zettl-Schabath; Jakob Bund; Jakob Bund; Martin Müller; Martin Müller; Camille Borrett; Jonas Hemmelskamp; Jonas Hemmelskamp; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley; Camille Borrett; Asaf Alibegovic; Enis Bajra; Alisa Jazxhi; Erik Kellenter; Annika Sachs; Callahan Shelley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The European Repository of Cyber Incidents (EuRepoC) is releasing the Global Dataset of Cyber Incidents in Version 1.3 as an extract of our backend database. This official release contains fully consolidated cyber incident data reviewed by our interdisciplinary experts in the fields of politics, law and technology across all 60 variables covered by the European Repository. Version 1.3 covers the years 2000 – 2024 entirely. The Global Dataset is meant for reliable, evidence-based analysis. If you require real-time data, please refer to the download option in our TableView or contact us for special requirements (including API access).

The dataset now contains data on 3416 cyber incidents which started between 01.01.2000 and 31.12.2024. The European Repository of Cyber Incidents (EuRepoC) gathers, codes, and analyses publicly available information from over 220 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

Full Codebook available here

Information about each file

please scroll down this page entirely to see all files available. Zenodo only displays the attribution dataset by default.

Global Database (csv or xlsx):
This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

Receiver Dataset (csv or xlsx):
In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

Attribution Dataset (csv or xlsx):
This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

Dyadic Dataset (csv or xlsx):
The dyadic dataset puts state dyads in the focus. Each row in the dataset represents one cyber incident in a specific dyad. Because incidents may affect multiple receivers, single incidents can be duplicated in this format, when they affected multiple countries.
w
Database for Cyber Security Agencies
whoisdatacenter.com
csv
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Database for Cyber Security Agencies [Dataset]. https://whoisdatacenter.com/whois-database-for-cyber-security-companies/
Explore at:
csvAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Mar 27, 2025
Description
Strengthen your cyber defense with our extensive, daily-updated WHOIS database. Accessible in CSV, JSON, and XML, it's a crucial asset for any security strategy.
S
Survey data from a survey about cybersecurity training and usability of...
snd.se
doc
Updated Jun 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Kävrestad (2021). Survey data from a survey about cybersecurity training and usability of security functions [Dataset]. http://doi.org/10.5878/pv4m-s237
Explore at:
doc(34816)Available download formats
Unique identifier
https://doi.org/10.5878/pv4m-s237
Dataset updated
Jun 29, 2021
Dataset provided by
University of Skövde
Swedish National Data Service
Authors
Joakim Kävrestad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Europe, Sweden, Northern Europe, Southern Europe, Italy, United Kingdom
Dataset funded by
Vinnova
Description
This data set was acquired using a survey which intends to measure: • Participants previous experience of cybersecurity training • Participants perception of ideal cybersecurity training • Participants perception of a specific cybersecurity training type called ContextBased MicroTraining • What usability aspects the participants find most important for security features Data was acquired from Sweden, UK and Italy to allow for comparative analysis. Demographic data was collected to allow for further analysis based on those. The files included in this data set are: • Completesurvey: This document includes the full survey presented to the participants. • Dataset: This file contains the variables and data for the different questions (available as .sav (SPSS and .csv)). • Var_info: contains information about the variables in the dataset • Overview: Contains frequency tables for the survey question (for the complete data set) • Sweden, UK, and Italy: Contains frequency tables for the survey questions divided by national sample groups.

Se attahed description
P
Kitsune Network Attack Dataset Dataset
paperswithcode.com
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai (2023). Kitsune Network Attack Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kitsune-network-attack-dataset
Explore at:
Dataset updated
Oct 16, 2023
Authors
Yisroel Mirsky; Tomer Doitshman; Yuval Elovici; Asaf Shabtai
Description
Kitsune Network Attack Dataset This is a collection of nine network attack datasets captured from a either an IP-based commercial surveillance system or a network full of IoT devices. Each dataset contains millions of network packets and diffrent cyber attack within it.

For each attack, you are supplied with:

A preprocessed dataset in csv format (ready for machine learning) The corresponding label vector in csv format The original network capture in pcap format (in case you want to engineer your own features)

We will now describe in detail what's in these datasets and how they were collected.

The Network Attacks We have collected a wide variety of attacks which you would find in a real network intrusion. The following is a list of the cyber attack datasets avalaible:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F827271%2F79e305668553e521b0709a2413323c45%2Fkaggle_dataset_table.png?generation=1598461684070844&alt=media" alt="image" width="100">

For more details on the attacks themselves, please refer to our NDSS paper (citation below).

The Data Collection The following figure presents the network topologies which we used to collect the data, and the corrisponding attack vectors at which the attacks were performed. The network capture took place at point 1 and point X at the router (where a network intrusion detection system could feasibly be placed). For each dataset, clean network traffic was captured for the first 1 million packets, then the cyber attack was performed.

The Dataset Format Each preprocessed dataset csv has m rows (packets) and 115 columns (features) with no header. The 115 features were extracted using our AfterImage feature extractor, described in our NDSS paper (see below) and available in Python here. In summary, the 115 features provide a statistical snapshot of the network (hosts and behaviors) in the context of the current packet traversing the network. The AfterImage feature extractor is unique in that it can efficiently process millions of streams (network channels) in real-time, incrementally, making it suitable for handling network traffic.

Citation If you use these datasets, please cite:

@inproceedings{mirsky2018kitsune, title={Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection}, author={Mirsky, Yisroel and Doitshman, Tomer and Elovici, Yuval and Shabtai, Asaf}, booktitle={The Network and Distributed System Security Symposium (NDSS) 2018}, year={2018} }
Dataset: What Are Cybersecurity Education Papers About? A Systematic...
zenodo.org
explore.openaire.eu
+1more
zip
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valdemar Švábenský; Valdemar Švábenský; Jan Vykopal; Jan Vykopal; Pavel Čeleda; Pavel Čeleda (2023). Dataset: What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences [Dataset]. http://doi.org/10.5281/zenodo.3506640
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3506640
Dataset updated
Jul 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valdemar Švábenský; Valdemar Švábenský; Jan Vykopal; Jan Vykopal; Pavel Čeleda; Pavel Čeleda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary materials for the following conference paper:

Valdemar Švábenský, Jan Vykopal, Pavel Čeleda.
What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences.
In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020).
https://doi.org/10.1145/3328778.3366816

Preprint available at: https://arxiv.org/abs/1911.11675

How to cite

If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

@inproceedings{Svabensky2020what, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and \v{C}eleda, Pavel}, title = {{What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences}}, booktitle = {Proceedings of the 51st ACM Technical Symposium on Computer Science Education}, series = {SIGCSE '20}, location = {Portland, OR, USA}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, month = {03}, year = {2020}, pages = {2--8}, numpages = {7}, isbn = {978-1-4503-6793-6}, url = {https://doi.org/10.1145/3328778.3366816}, doi = {10.1145/3328778.3366816}, }

Attached content

The file "SIGCSE 2020 Literature Review.xlsx" is an Excel spreadsheet with three sheets corresponding to 1) all papers found by automated search, 2) manually excluded papers, and 3) papers included in the literature review. There are also three CSV files that correspond to the three individual sheets.
z
Global Dataset of Cyber Incidents V.1.2
zenodo.org
bin, csv, json
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC) (2024). Global Dataset of Cyber Incidents V.1.2 [Dataset]. http://doi.org/10.5281/zenodo.11108195
Explore at:
csv, bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11108195
Dataset updated
May 3, 2024
Dataset provided by
European Repository of Cyber Incidents (EuRepoC)
Authors
European Repository of Cyber Incidents (EuRepoC); European Repository of Cyber Incidents (EuRepoC)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 2, 2024
Description
The dataset contains data on 2889 cyber incidents between 01.01.2000 and 02.05.2024 using 60 variables, including the start date, names and categories of receivers along with names and categories of initiators. The database was compiled as part of the European Repository of Cyber Incidents (EuRepoC) project.

EuRepoC gathers, codes, and analyses publicly available information from over 200 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.

For more information on the scope and data collection methodology see: https://eurepoc.eu/methodology

Codebook available here

Information about each file:

Global Database (csv or xlsx):
This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

Receiver Dataset (csv):
In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

Attribution Dataset (csv):
This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.

eurepoc_global_database_1.2 (json):
This file contains the whole database in JSON format.
d
Global Domain Name Data | DNS and Risk Classification via Dataset & API |...
datarade.ai
.csv, .json
Updated Nov 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datazag (2024). Global Domain Name Data | DNS and Risk Classification via Dataset & API | 267M+ Domains Covering Over 1570 Domain Zones | Updated Daily [Dataset]. https://datarade.ai/data-products/datazag-global-domain-name-data-dns-and-risk-classificatio-datazag
Explore at:
.csv, .jsonAvailable download formats
Dataset updated
Nov 2, 2024
Dataset authored and provided by
Datazag
Area covered
Bahamas, Lesotho, Marshall Islands, Dominica, Kenya, State of, Norway, Niue, Gambia, Paraguay
Description
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
m
Data from: Hornet 40: Network Dataset of Geographically Placed Honeypots
data.mendeley.com
data.niaid.nih.gov
+1more
Updated Jun 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veronica Valeros (2021). Hornet 40: Network Dataset of Geographically Placed Honeypots [Dataset]. http://doi.org/10.17632/tcfzkbpw46.1
Explore at:
Unique identifier
https://doi.org/10.17632/tcfzkbpw46.1
Dataset updated
Jun 3, 2021
Authors
Veronica Valeros
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hornet 40 is a dataset of 40 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots are located in eight different cities: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore. The data was captured in April, May, and June 2021.

The eight cloud servers were created and configured simultaneously following identical instructions. The network capture was performed using the Argus network monitoring tool in each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated as a honeypot. No honeypot software was used in this dataset.

The dataset consists of eight scenarios, one for each geographically located cloud server. Each scenario contains bidirectional NetFlow files in the following format: - hornet40-biargus.tar.gz: all scenarios with bidirectional NetFlow files in Argus binary format; - hornet40-netflow-v5.tar.gz: all scenarios with bidirectional NetFlow v5 files in CSV format; - hornet40-netflow-extended.tar.gz: all scenarios with bidirectional NetFlows files in CSV format containing all features provided by Argus. - hornet40-full.tar.gz: download all the data (biargus, NetFlow v5, and extended NetFlows)
m
Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning...
data.mendeley.com
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Wang (2022). Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning based Encrypted Traffic Analysis [Dataset]. http://doi.org/10.17632/xw7r4tt54g.1
Explore at:
Unique identifier
https://doi.org/10.17632/xw7r4tt54g.1
Dataset updated
Dec 6, 2022
Authors
Zihao Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets.

Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic.

Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset.

The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.
MAD (MAlicious Traffic Dataset) in home and commercial environments - Home...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jul 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques (2021). MAD (MAlicious Traffic Dataset) in home and commercial environments - Home environment [Dataset]. http://doi.org/10.5281/zenodo.5094055
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5094055
Dataset updated
Jul 19, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For the home environment we have: 01 Wifi Modem Router, 03 Smartphones, 01 server, 01 desktop, 01 Multifunction Printer, 01 network extender, 01 SmartTV, 01 Cable TV decoder and 01 firewall. This environment is a local network. The server has the Monitoring Environment and a network card, which provides connectivity and receives all network traffic for analysis.

The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.

events.csv.gz - Suricata / Evebox collections

net.csv.gz - Telegraf collections from the TICK stack

netstat.csv.gz - Telegraf collections from the TICK stack

For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.

The interval between collections, non-consecutive, was from 2018-09-15 to 2019-02-04
Malware Repositories and Their Authors on GitHub
zenodo.org
data.niaid.nih.gov
csv, txt
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nishat Ara Tania; Nishat Ara Tania; Md Rayhanul Masud; Md Rayhanul Masud; Md Omar Faruk Rokon; Md Omar Faruk Rokon; Qian Zhang; Qian Zhang; Michalis Faloutsos; Michalis Faloutsos (2024). Malware Repositories and Their Authors on GitHub [Dataset]. http://doi.org/10.5281/zenodo.10806593
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10806593
Dataset updated
Mar 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nishat Ara Tania; Nishat Ara Tania; Md Rayhanul Masud; Md Rayhanul Masud; Md Omar Faruk Rokon; Md Omar Faruk Rokon; Qian Zhang; Qian Zhang; Michalis Faloutsos; Michalis Faloutsos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 4, 2024
Description
This dataset is rooted in a study aimed at unveiling the origins and motivations behind the creation of malware repositories on GitHub. Our research embarks on an innovative journey to dissect the profiles and intentions of GitHub users who have been involved in this dubious activity.

Employing a robust methodology, we meticulously identified 14,000 GitHub users linked to malware repositories. By leveraging advanced large language model (LLM) analytics, we classified these individuals into distinct categories based on their perceived intent: 3,339 were deemed Malicious, 3,354 Likely Malicious, and 7,574 Benign, offering a nuanced perspective on the community behind these repositories.

Our analysis penetrates the veil of anonymity and obscurity often associated with these GitHub profiles, revealing stark contrasts in their characteristics. Malicious authors were found to typically possess sparse profiles focused on nefarious activities, while Benign authors presented well-rounded profiles, actively contributing to cybersecurity education and research. Those labeled as Likely Malicious exhibited a spectrum of engagement levels, underlining the complexity and diversity within this digital ecosystem.

We are offering two datasets in this paper. First, a list of malware repositories - we have collected and extended the malware repositories on the GitHub in 2022 following the original papers. Second, a csv file with the github users information with their maliciousness classfication label.

malware_repos.txt

Purpose: This file contains a curated list of GitHub repositories identified as containing malware. These repositories were identified following the methodology outlined in the research paper "SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub."

Contents: The file is structured as a simple text file, with each line representing a unique repository in the format username/reponame. This format allows for easy identification and access to each repository on GitHub for further analysis or review.

Usage: The list serves as a critical resource for researchers and cybersecurity professionals interested in studying malware, understanding its distribution on platforms like GitHub, or developing defense mechanisms against such malicious content.

obfuscated_github_user_dataset.csv

Purpose: Accompanying the list of malware repositories, this CSV file contains detailed, albeit obfuscated, profile information of the GitHub users who authored these repositories. The obfuscation process has been applied to protect user privacy and comply with ethical standards, especially given the sensitive nature of associating individuals with potentially malicious activities.

Contents: The dataset includes several columns representing different aspects of user profiles, such as obfuscated identifiers (e.g., ID, login, name), contact information (e.g., email, blog), and GitHub-specific metrics (e.g., followers count, number of public repositories). Notably, sensitive information has been masked or replaced with generic placeholders to prevent user identification.

Usage: This dataset can be instrumental for researchers analyzing behaviors, patterns, or characteristics of users involved in creating malware repositories on GitHub. It provides a basis for statistical analysis, trend identification, or the development of predictive models, all while upholding the necessary ethical considerations.

DNP3 Intrusion Detection Dataset

zenodo.org
ieee-dataport.org
+1more

bin, pdf

Updated Jul 15, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Panagiotis; Panagiotis; Vasiliki; Vasiliki; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis (2024). DNP3 Intrusion Detection Dataset [Dataset]. http://doi.org/10.21227/s7h0-b081

Explore at:

bin, pdfAvailable download formats

Unique identifier

https://doi.org/10.21227/s7h0-b081

Dataset updated

Jul 15, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Panagiotis; Panagiotis; Vasiliki; Vasiliki; Thomas; Thomas; Vasileios; Vasileios; Panagiotis; Panagiotis

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1.Introduction

In the digital era of the Industrial Internet of Things (IIoT), the conventional Critical Infrastructures (CIs) are transformed into smart environments with multiple benefits, such as pervasive control, self-monitoring and self-healing. However, this evolution is characterised by several cyberthreats due to the necessary presence of insecure technologies. DNP3 is an industrial communication protocol which is widely adopted in the CIs of the US. In particular, DNP3 allows the remote communication between Industrial Control Systems (ICS) and Supervisory Control and Data Acquisition (SCADA). It can support various topologies, such as Master-Slave, Multi-Drop, Hierarchical and Multiple-Server. Initially, the architectural model of DNP3 consists of three layers: (a) Application Layer, (b) Transport Layer and (c) Data Link Layer. However, DNP3 can be now incorporated into the Transmission Control Protocol/Internet Protocol (TCP/IP) stack as an application-layer protocol. However, similarly to other industrial protocols (e.g., Modbus and IEC 60870-5-104), DNP3 is characterised by severe security issues since it does not include any authentication or authorisation mechanisms. More information about the DNP3 security issue is provided in [1-3]. This dataset contains labelled Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics (Common-Separated Values - CSV format) and DNP3 flow statistics (CSV format) related to 9 DNP3 cyberattacks. These cyberattacks are focused on DNP3 unauthorised commands and Denial of Service (DoS). The network traffic data are provided through Packet Capture (PCAP) files. Consequently, this dataset can be used to implement Artificial Intelligence (AI)-powered Intrusion Detection and Prevention (IDPS) systems that rely on Machine Learning (ML) and Deep Learning (DL) techniques.

2.Instructions

This DNP3 Intrusion Detection Dataset was implemented following the methodological frameworks of A. Gharib et al. in [4] and S. Dadkhah et al in [5], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.

A network topology consisting of (a) eight industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to implement this DNP3 Intrusion Detection Dataset. In particular, the following cyberattacks were implemented.

On Thursday, May 14, 2020, the DNP3 Disable Unsolicited Messages Attack was executed for 4 hours.
On Friday, May 15, 2020, the DNP3 Cold Restart Message Attack was executed for 4 hours.
On Friday, May 15, 2020, the DNP3 Warm Restart Message Attack was executed for 4 hours.
On Saturday, May 16, 2020, the DNP3 Enumerate Attack was executed for 4 hours.
On Saturday, May 16, 2020, the DNP3 Info Attack was executed for 4 hours.
On Monday, May 18, 2020, the DNP3 Initialisation Attack was executed for 4 hours.
On Monday, May 18, 2020, the Man In The Middle (MITM)-DoS Attack was executed for 4 hours.
On Monday, May 18, 2020, the DNP3 Replay Attack was executed for 4 hours.
On Tuesday, May 19, 2020, the DNP3 Stop Application Attack was executed for 4 hours.

The aforementioned DNP3 cyberattacks were executed, utilising penetration testing tools, such as Nmap and Scapy. For each attack, a relevant folder is provided, including the network traffic and the network flow statistics for each entity. In particular, for each cyberattack, a folder is given, providing (a) the pcap files for each entity, (b) the Transmission Control Protocol (TCP)/ Internet Protocol (IP) network flow statistics for 120 seconds in a CSV format and (c) the DNP3 flow statistics for each entity (using different timeout values in terms of second (such as 45, 60, 75, 90, 120 and 240 seconds)). The TCP/IP network flow statistics were produced by using the CICFlowMeter, while the DNP3 flow statistics were generated based on a Custom DNP3 Python Parser, taking full advantage of Scapy.

3. Dataset Structure

The dataset consists of the following folders:

20200514_DNP3_Disable_Unsolicited_Messages_Attack: It includes the pcap and CSV files related to the DNP3 Disable Unsolicited Message attack.
20200515_DNP3_Cold_Restart_Attack: It includes the pcap and CSV files related to the DNP3 Cold Restart attack.
20200515_DNP3_Warm_Restart_Attack: It includes the pcap and CSV files related to DNP3 Warm Restart attack.
20200516_DNP3_Enumerate: It includes the pcap and CSV files related to the DNP3 Enumerate attack.
20200516_DNP3_Ιnfo: It includes the pcap and CSV files related to the DNP3 Info attack.
20200518_DNP3_Initialize_Data_Attack: It includes the pcap and CSV files related to the DNP3 Data Initialisation attack.
20200518_DNP3_MITM_DoS: It includes the pcap and CSV files related to the DNP3 MITM-DoS attack.
20200518_DNP3_Replay_Attack: It includes the pcap and CSV files related to the DNP3 replay attack.
20200519_DNP3_Stop_Application_Attack: It includes the pcap and CSV files related to the DNP3 Stop Application attack.
Training_Testing_Balanced_CSV_Files: It includes balanced CSV files from CICFlowMeter and the Custom DNP3 Python Parser that could be utilised for training ML and DL methods. Each folder includes different sub-folder for the corresponding flow timeout values used by the DNP3 Python Custom Parser. For CICFlowMeter, only the timeout value of 120 seconds was used.

Each folder includes respective subfolders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the DNP3 network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) generated by CICFlowMeter for the timeout value of 120 seconds and finally (c) the DNP3 flow statistics (CSV file) from the Custom DNP3 Python Parser. Finally, it is noteworthy that the network flows from both CICFlowMeter and Custom DNP3 Python Parser in each CSV file are labelled based on the DNP3 cyberattacks executed for the generation of this dataset. The description of these attacks is provided in the following section, while the various features from CICFlowMeter and Custom DNP3 Python Parser are presented in Section 5.

4.Testbed & DNP3 Attacks

The following figure shows the testbed utilised for the generation of this dataset. It is composed of eight industrial entities that play the role of the DNP3 outstations/slaves, such as Remote Terminal Units (RTUs) and Intelligent Electron Devices (IEDs). Moreover, there is another workstation which plays the role of the Master station like a Master Terminal Unit (MTU). For the communication between, the DNP3 outstations/slaves and the master station, opendnp3 was used.

Table 1: DNP3 Attacks Description

DNP3 Attack	Description	Dataset Folder
DNP3 Disable Unsolicited Message Attack	This attack targets a DNP3 outstation/slave, establishing a connection with it, while acting as a master station. The false master then transmits a packet with the DNP3 Function Code 21, which requests to disable all the unsolicited messages on the target.	20200514_DNP3_Disable_Unsolicited_Messages_Attack
DNP3 Cold Restart Attack	The malicious entity acts as a master station and sends a DNP3 packet that includes the “Cold Restart” function code. When the target receives this message, it initiates a complete restart and sends back a reply with the time window before the restart process.	20200515_DNP3_Cold_Restart_Attack
DNP3 Warm Restart Attack	This attack is quite similar to the “Cold Restart Message”, but aims to trigger a partial restart, re-initiating a DNP3 service on the target outstation.	20200515_DNP3_Warm_Restart_Attack
DNP3 Enumerate Attack	This reconnaissance attack aims to discover which DNP3 services and functional codes are used by the target system.	20200516_DNP3_Enumerate
DNP3 Info Attack	This attack constitutes another reconnaissance attempt, aggregating various DNP3 diagnostic information related the DNP3 usage.	20200516_DNP3_Ιnfo
Data Initialisation Attack	This cyberattack is related to Function Code 15 (Initialize Data). It is an unauthorised access attack, which demands from the slave to re-initialise possible configurations to their initial values, thus changing potential values defined by legitimate masters	20200518_Initialize_Data_Attack
MITM-DoS Attack	In

Network traffic datasets created by Single Flow Time Series Analysis

zenodo.org
explore.openaire.eu

csv, pdf

Updated Jul 11, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724

Explore at:

csv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8035724

Dataset updated

Jul 11, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name	Detection problem	Citation of original raw dataset
botnet_binary.csv	Binary detection of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv	Multi-class classification of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv	Binary detection of cryptomining; the design part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv	Binary detection of cryptomining; the evaluation part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv	Binary detection of malware DNS	Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv	Binary detection of DoH	Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv	Binary detection of DoH	Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv	Binary detection of DoS	Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv	Binary detection of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv	Multi-class classification of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv	Binary detection of HTTPS Brute Force	Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv	Binary detection of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv	Multi-class classification of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv	Binary detection of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv	Multi-class classification of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv	Binary detection of IoT malware	Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv	Binary detection of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv	Multi-class classification of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv	Binary detection of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv	Multi-class classification of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv	Binary detection of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv	Multi-class classification of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv	Binary detection of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv	Multi-class classification of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

Cyberbullying Dataset
kaggle.com
Updated Oct 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2022). Cyberbullying Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/cyberbullying-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saurabh Shahane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

This dataset is a collection of datasets from different sources related to the automatic detection of cyber-bullying. The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

Content

The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

Acknowledgements

Elsafoury, Fatma (2020), “Cyberbullying datasets”, Mendeley Data, V1, doi: 10.17632/jf4pzyvnpj.1
m
RT Spoofing Attacks on MIL-STD-1553 Communication Traffic
data.mendeley.com
Updated Dec 2, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ran Yahalom (2018). RT Spoofing Attacks on MIL-STD-1553 Communication Traffic [Dataset]. http://doi.org/10.17632/jvgdrmjvs3.2
Explore at:
Unique identifier
https://doi.org/10.17632/jvgdrmjvs3.2
Dataset updated
Dec 2, 2018
Authors
Ran Yahalom
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MIL-STD-1553 is a military standard that defines the protocol characteristics of a data bus medium for the exchange of information between various subsystems. Although the threat of cyber-attacks on the MIL-STD-1553 protocol has become a growing concern in recent years, little work has been published on detecting such attacks. One of the primary reasons for this is the confidentiality of data recorded from buses on operational systems and as a result, lack of data availability. Moreover, existing research doesn’t sufficiently emphasize the complexity of detecting attacks that can be camouflaged by normal non-periodic messages that the MIL-STD-1553 supports.

We present three datasets of synthesized MIL-STD-1553 traffic containing injected RT Spoofing Attack messages. The implemented attacks emulate normal non-periodical communication so detecting them with a low false positive rate is non-trivial. Each dataset is separated into a training set of normal messages and a test set of both normal and attack messages. The test sets differ by the occurrence rate of attack messages (0.01%, 0.1%, and 1%). Each dataset is also preprocessed into a dataset of message sequences so that it can be used for sequential anomaly detection analysis. The sequential test sets differ by the occurrence rate of attack sequences (0.14%, 1.26%, and 11.01%). A Java program for generating the sequence datasets from the message stream datasets is also included so users can generate new sequence datasets with different sequence lengths or a labeling according to whether or not the message was injected instead of whether or not it affected the aircraft's behavior.

These datasets are intended to serve three primary purposes: (1) evaluate the ability of MIL-STD-1553 intrusion detection systems (IDS) to detect attacks that emulate normal non-periodical traffic; (2) evaluate IDSs on differing occurrence rates of attacks; (3) evaluate and compare IDSs that operate on non-sequential data as well as IDSs that operate on sequential data.

Please refer to the linked data description document for the full details of the data synthesis process, the motivation for our preprocessing into sequences, and the format of the CSV files. This document also provides relevant background on detecting Spoofing Attacks from MIL-STD-1553 Traffic.
Exploits: All registered in exploit-db, from January 2019 to October 2020
zenodo.org
csv
Updated Nov 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon (2020). Exploits: All registered in exploit-db, from January 2019 to October 2020 [Dataset]. http://doi.org/10.5281/zenodo.4259954
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4259954
Dataset updated
Nov 8, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erika Bracamonte; Anthony Alarcon; Erika Bracamonte; Anthony Alarcon
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
This dataset contains all exploits registered on the exploit-db website, from 02 January 2019 to 06 November 2020. 2,665 exploits were found in this time range, and stored in CSV file. The CSV fields are as follows:

Id: exploit-db identifier.

Link Exploit: Link to download exploit code source.

App: Link to download the application in case the exploit is an application, and if it isn't an app the field is empty.

Date: Publication date on the website.

Verification: Exploit verfication by website.

Title: Exploit title.

Type: Exploit type. Could be Local, Remote and Webapp.

Platform: Platform on which the exploit is based on.

Author: Name of exploit author.
d
Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...
datarade.ai
.json, .csv, .xls
Updated Sep 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Altosight (2024). Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data Points | Bypassing All CAPTCHAs & Blocking Mechanisms | GDPR Compliant [Dataset]. https://datarade.ai/data-products/altosight-ai-custom-web-scraping-data-100-global-free-altosight
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
Altosight
Area covered
Guatemala, Singapore, Côte d'Ivoire, Greenland, Czech Republic, Tajikistan, Paraguay, Chile, Wallis and Futuna, Svalbard and Jan Mayen
Description
Altosight | AI Custom Web Scraping Data

✦ Altosight provides global web scraping data services with AI-powered technology that bypasses CAPTCHAs, blocking mechanisms, and handles dynamic content.

We extract data from marketplaces like Amazon, aggregators, e-commerce, and real estate websites, ensuring comprehensive and accurate results.

✦ Our solution offers free unlimited data points across any project, with no additional setup costs.

We deliver data through flexible methods such as API, CSV, JSON, and FTP, all at no extra charge.

― Key Use Cases ―

➤ Price Monitoring & Repricing Solutions

🔹 Automatic repricing, AI-driven repricing, and custom repricing rules 🔹 Receive price suggestions via API or CSV to stay competitive 🔹 Track competitors in real-time or at scheduled intervals

➤ E-commerce Optimization

🔹 Extract product prices, reviews, ratings, images, and trends 🔹 Identify trending products and enhance your e-commerce strategy 🔹 Build dropshipping tools or marketplace optimization platforms with our data

➤ Product Assortment Analysis

🔹 Extract the entire product catalog from competitor websites 🔹 Analyze product assortment to refine your own offerings and identify gaps 🔹 Understand competitor strategies and optimize your product lineup

➤ Marketplaces & Aggregators

🔹 Crawl entire product categories and track best-sellers 🔹 Monitor position changes across categories 🔹 Identify which eRetailers sell specific brands and which SKUs for better market analysis

➤ Business Website Data

🔹 Extract detailed company profiles, including financial statements, key personnel, industry reports, and market trends, enabling in-depth competitor and market analysis

🔹 Collect customer reviews and ratings from business websites to analyze brand sentiment and product performance, helping businesses refine their strategies

➤ Domain Name Data

🔹 Access comprehensive data, including domain registration details, ownership information, expiration dates, and contact information. Ideal for market research, brand monitoring, lead generation, and cybersecurity efforts

➤ Real Estate Data

🔹 Access property listings, prices, and availability 🔹 Analyze trends and opportunities for investment or sales strategies

― Data Collection & Quality ―

► Publicly Sourced Data: Altosight collects web scraping data from publicly available websites, online platforms, and industry-specific aggregators

► AI-Powered Scraping: Our technology handles dynamic content, JavaScript-heavy sites, and pagination, ensuring complete data extraction

► High Data Quality: We clean and structure unstructured data, ensuring it is reliable, accurate, and delivered in formats such as API, CSV, JSON, and more

► Industry Coverage: We serve industries including e-commerce, real estate, travel, finance, and more. Our solution supports use cases like market research, competitive analysis, and business intelligence

► Bulk Data Extraction: We support large-scale data extraction from multiple websites, allowing you to gather millions of data points across industries in a single project

► Scalable Infrastructure: Our platform is built to scale with your needs, allowing seamless extraction for projects of any size, from small pilot projects to ongoing, large-scale data extraction

― Why Choose Altosight? ―

✔ Unlimited Data Points: Altosight offers unlimited free attributes, meaning you can extract as many data points from a page as you need without extra charges

✔ Proprietary Anti-Blocking Technology: Altosight utilizes proprietary techniques to bypass blocking mechanisms, including CAPTCHAs, Cloudflare, and other obstacles. This ensures uninterrupted access to data, no matter how complex the target websites are

✔ Flexible Across Industries: Our crawlers easily adapt across industries, including e-commerce, real estate, finance, and more. We offer customized data solutions tailored to specific needs

✔ GDPR & CCPA Compliance: Your data is handled securely and ethically, ensuring compliance with GDPR, CCPA and other regulations

✔ No Setup or Infrastructure Costs: Start scraping without worrying about additional costs. We provide a hassle-free experience with fast project deployment

✔ Free Data Delivery Methods: Receive your data via API, CSV, JSON, or FTP at no extra charge. We ensure seamless integration with your systems

✔ Fast Support: Our team is always available via phone and email, resolving over 90% of support tickets within the same day

― Custom Projects & Real-Time Data ―

✦ Tailored Solutions: Every business has unique needs, which is why Altosight offers custom data projects. Contact us for a feasibility analysis, and we’ll design a solution that fits your goals

✦ Real-Time Data: Whether you need real-time data delivery or scheduled updates, we provide the flexibility to receive data when you need it. Track price changes, monitor product trends, or gather...
45 Vulnerability Discoverability Timelines from the 2019 Collegiate...
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin S. Meyers; Benjamin S. Meyers; Andrew Meneely; Andrew Meneely (2022). 45 Vulnerability Discoverability Timelines from the 2019 Collegiate Penetration Testing Competition [Dataset]. http://doi.org/10.5281/zenodo.5781239
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5781239
Dataset updated
Apr 22, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin S. Meyers; Benjamin S. Meyers; Andrew Meneely; Andrew Meneely
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This is a collection of manually curated timelines from the 2019 Collegiate Penetration Testing Competition (CPTC). Collection and annotation are described in detail in this publication:

Benjamin S. Meyers, Sultan Fahad Almassari, Brandon N. Keller, and Andrew Meneely. Examining Penetration Tester Behavior in the Collegiate Penetration Testing Competition. Forthcoming at Transactions on Software Engineering and Methodology. https://dl.acm.org/doi/10.1145/3514040

Included Files

2019_cptc_timelines.csv: Completed timelines for ten teams from the 2019 CPTC nationals competition.

2019_cptc_timeline_columns.csv: Descriptions of the columns in 2019_cptc_timelines.csv.

2019_cptc_vulnerabilities.csv: Brief vulnerability descriptions and CWE mappings.

Other Resources

Complete Splunk log data dumps are available here. These must be ingested and viewed with a Splunk instance.

To request access to the CPTC team reports, please contact Brock Wagehoft (email).

Contact

Please contact Benjamin S. Meyers (email) with questions about this data and its collection.

Acknowledgments

Collection of this data has been sponsored in part by the National Science Foundation grant 1922169, and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
MAD (MAlicious Traffic Dataset) in home and commercial environments -...
zenodo.org
application/gzip
Updated Jul 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques (2021). MAD (MAlicious Traffic Dataset) in home and commercial environments - Internet environment [Dataset]. http://doi.org/10.5281/zenodo.5111959
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5111959
Dataset updated
Jul 19, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlos Alberto Martins de Sousa Teles; Carlos Alberto Martins de Sousa Teles; Felipe da R. Henriques; Felipe da R. Henriques
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We have for the Internet environment: 01 Switch, 01 IP camera, 01 server for monitoring, 01 server for honeypot and no firewall. This environment is directly connected to the Internet. We installed a server, functioning as a Monitoring Environment. The network traffic was obtained via Port Mirroring on the switch to the Monitoring Environment server.

The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases.

events.csv.gz - Suricata / Evebox collections

net.csv.gz - Telegraf collections from the TICK stack

netstat.csv.gz - Telegraf collections from the TICK stack

For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files.

The interval between collections, non-consecutive, was from 2018-08-28 to 2019-11-14

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. http://doi.org/10.5281/zenodo.3746129

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3746129

Dataset updated

Jun 12, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Daniel Tovarňák; Daniel Tovarňák; Stanislav Špaček; Stanislav Špaček; Jan Vykopal; Jan Vykopal

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

Contents

The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

Day 1, March 19, 2019
- Start: 2019-03-19T11:00:00.000000+01:00
- End: 2019-03-19T18:00:00.000000+01:00
Day 2, March 20, 2019
- Start: 2019-03-20T08:00:00.000000+01:00
- End: 2019-03-20T15:30:00.000000+01:00

The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.
cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.
cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

data.json.gz – the actual data entries in a single gzipped JSON file.
dictionary.yml – data dictionary for the entries.
schema.ddl – data schema for Apache Spark analytics engine.
schema.jsch – JSON schema for the entries.

Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.
global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.
redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.
redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.
topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.
topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.

Clear search

Close search

Google apps

Main menu

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

Global Dataset of Cyber Incidents

Database for Cyber Security Agencies

Survey data from a survey about cybersecurity training and usability of...

Kitsune Network Attack Dataset Dataset

Dataset: What Are Cybersecurity Education Papers About? A Systematic...

Global Dataset of Cyber Incidents V.1.2

Global Domain Name Data | DNS and Risk Classification via Dataset & API |...

Data from: Hornet 40: Network Dataset of Geographically Placed Honeypots

Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning...

MAD (MAlicious Traffic Dataset) in home and commercial environments - Home...

Malware Repositories and Their Authors on GitHub

DNP3 Intrusion Detection Dataset

Network traffic datasets created by Single Flow Time Series Analysis

Cyberbullying Dataset

Context

Content

Acknowledgements

RT Spoofing Attacks on MIL-STD-1553 Communication Traffic

Exploits: All registered in exploit-db, from January 2019 to October 2020

Altosight | AI Custom Web Scraping Data | 100% Global | Free Unlimited Data...

45 Vulnerability Discoverability Timelines from the 2019 Collegiate...

MAD (MAlicious Traffic Dataset) in home and commercial environments -...

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise