Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.
The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.
Facebook
Twitternaorm/malware-text-db-securebert-ner-512 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Image representation of Malware-benign dataset. The Dataset were compiled from various sources malware repositories: The Malware-Repo, TheZoo,Malware Bazar, Malware Database, TekDefense. Meanwhile benign samples were sourced from system application of Microsoft 10 and 11, as well as open source software repository such as Sourceforge, PortableFreeware, CNET, FileForum. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples were pre-processed by transforming the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: https://vision.ece.ucsb.edu/research/signal-processing-malware-analysis. Malware and benign sample were collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset utilized in this study comprises 22,056 samples, encompassing various categories of malware as well as benign software. This dataset originates from the research paper by Md Shahnawaz, Bishwajit Prasad Gond, and Durga Prasad Mohapatra titled “**Dynamic Malware Classification of Windows PE Files using CNNs and Greyscale Images Derived from Runtime API Call Argument Conversion**,” Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), held at the Indian Institute of Technology Indore on June 2, 2025. https://doi.org/10.48550/arXiv.2505.24231
The dataset, encompassing a total of 22,056 samples, includes various malware types and benign samples, each identified with a naming convention of Name_Hash. Here, Name denotes the malware family type, and Hash represents the hash of the Portable Executable (PE) file sourced from VirusShare and VirusTotal. For example, a sample might be named adware_0013996b0815f1b53ec52a46d0279b0d.png. The dataset consists of the following:
The initial phase involves obtaining the malware hash from the VirusShare database1. This hash serves as a unique identifier for subsequent analysis.
Using the acquired hash, a query is submitted to the VirusTotal platform2 to retrieve a JSON file encapsulating results from over 70 distinct antivirus scans. These results are analyzed to classify the malware into its respective category.
Post-classification, malware samples from various categories are systematically downloaded for further examination.
Each malware category undergoes dynamic analysis within a controlled environment, specifically the Cuckoo Sandbox3. This process monitors the malware's behavioral patterns during execution.
From the Portable Executable (PE) files’ behavioral data, provided as JSON output by the Cuckoo Sandbox, an API call sequence report is extracted in JSON format. This report is subsequently segmented into four distinct text files: - API Name - API Argument - API Return Value - API Category
The segmented data is subjected to $n$-gram analysis, integrating API names with their corresponding arguments. Unique n-grams are derived across all malware categories, followed by the computation of Term Frequency (TF) metrics to quantify their significance. Finally, we convert this feature vector i.e https://www.kaggle.com/datasets/bishwajitprasadgond/malware-benign-api-call-argument-feature-vector into images.
This phase focuses on converting the structured CSV data into images suitable for training with CNN.
All numeric API values in each row were normalized into a range of 0–255.
The resulting vectors were reshaped into square matrices (e.g., 128×128), generating grayscale image representations of API usage patterns for each malware.
To highlight important features and patterns in the image, the following techniques were applied:
A magma colormap was applied to the grayscale images, adding color richness for enhanced feature representation.
Finally, contrast and sharpness improvements were applied before saving the images to disk for training.
The outlined methodology facilitates a comprehensive analysis of malware behavior, enabling the derivation of actionable insights for further investigation and mitigation strategies.
Facebook
Twitternaorm/malware-text-db-cyner dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php
The complete Comodo Internet Security database is available for download...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by malwareTBugs
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterComprehensive database of website malware threats, vulnerabilities, and security risks detected by Quttera's malware scanner.
Facebook
TwitterThe dataset comprises medical imaging data that demonstrate the presence or absence of illnesses. used to simulate AI-based malware modulation, this database is paired with malware-modulated counterparts. By creating tampered images on the fly from the benign dataset using three mechanisms:Adversarial perturbations to input data that can cause data misclassification.Patch-level content edits by Copying-pasting or inpainting of small square regions (8–32 px) to simulate lesion insertion or removal.Metadata-consistent rescaling for random resize and crop variance. Each training batch is a duplicate of the original images.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko Niemelä (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was compiled over a year and a half from various websites and sources, and it contains 7449 benign and malicious IoMT packets presented by real-world components of the e-healthcare system that monitor network transmission. Data quality is improved at several preprocessing stages, including dealing with noises and unwanted values as strings, cleaning, encoding string features, and rescaling all disordered data values using data transformation functions. To standardize the analysis of network features, we only consider features related to networking characteristics and reject all other features that provide insights into the patient's vital signs. This data set is for analyzing the IoMT traffic behavior within the smart hospital's networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.
Facebook
TwitterThis dataset was created by Ikram Ben abdel ouahab
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the data set and source code related to the paper: "DEVIL: A Framework for Discovering and Evaluating Insidious Advanced Persistent Threats Leveraging Graph-Based Algorithms"
1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors
3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
4- countries.zip : contains country name list.
5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/
6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families
7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher
8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip
9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.
10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.
11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:
docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0
web interface: http://localhost:7474 username: neo4j password: neo4j
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data in this dataset represent recorded network traffic of specimens of IoT malware samples that were collected from the links found on URLHaus database website (malware.zip), in the period from 2019 to 2021, at the University of Belgrade, School of Electric Engineering. These malware samples were run on RaspberryPi devices, with restricted local network access, and the network traffic was recorded using tcpdump tool. The benign network traffic (benign.zip) represents all the network traffic recorded on a personal computer for the duration of several hours, split into two files. All local network addresses were anonymized in the process of making these pcap files. The csv file in the dataset contains description for each of the malware pcap files, consisting of: file name, UrlHaus URL, bot address, malware address, attack presence, attacked address, URLHaus tags, collection date (in the DD/MM/YYYY format), and comment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the data set and source code related to the paper: "A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms"
1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors and https://malpedia.caad.fkie.fraunhofer.de/actors
3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
4- countries.zip : contains country name list.
5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/
6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families
7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher
8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip
9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.
10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.
11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:
docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0
web interface: http://localhost:7474
username: neo4j
password: neo4j
Facebook
TwitterThe Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cyber attacks are a growing concern for small businesses during COVID-19 . Be Protected While You Work. Upgrade Your Small Business's Virus Protection Today! Before going for a Cyber security solutions for small to mid-sized businesses deliver enterprise-level protection.
Download this (Checklist for a Small Firm's Cybersecurity Program 2020-2021) data set to deploy secure functioning of various aspects of your small business including, employee data, website and more.This checklist is provided to
assist small member firms with limited resources to establish a cybersecurity program to identify and assess cybersecurity threats,
protect assets from cyber intrusions,
detect when their systems and assets have been compromised,
plan for the response when a compromise occurs and implement a plan to recover lost, stolen or unavailable assets.
Train employees in security principles.
Protect information, computers, and networks from malware attacks.
Provide firewall security for your Internet connection.
Create a mobile device action plan.
Make backup copies of important business data and information.
Learn about the threats and how to protect your website.
Protect Your Small Business site.
Learn the basics for protecting your business web sites from cyber attacks at WP Hacked Help Blog
Created With Inputs From Security Experts at WP Hacked Help - Pioneer In WordPress Malware Removal & Security
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented August 19, 2016. It is a database and web application describing the genome organization and providing analytical tools for the 938 known species of RNA virus. It can identify submitted nucleotide sequences, can place them into multiple whole-genome alignments (in species where more than one isolate has been fully sequenced) and contains translated genome sequences for all species. It has been created for two main purposes: to facilitate the comparative analysis of RNA viruses and to become a hub for other, more specialised virus Web sites.
Facebook
TwitterVirus-Host DB organizes data about the relationships between viruses and their hosts, represented in the form of pairs of NCBI taxonomy IDs for viruses and their hosts. Virus-Host DB covers viruses with complete genomes stored in 1) NCBI/RefSeq and 2) GenBank whose accession numbers are listed in EBI Genomes. The host information is collected from RefSeq, GenBank (in free text format), UniProt, ViralZone, and manually curated with additional information obtained by literature surveys.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.
The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.