89 datasets found

MalwareBazaar Malware Dataset (Sep - Oct 2025)
kaggle.com
zip
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025
Explore at:
zip(9415213 bytes)Available download formats
Dataset updated
Oct 9, 2025
Authors
José Reyes
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MalwareBazaar Malware Dataset.

Introduction.

This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

Description.

The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

Metadata removed (footer with unreadable information).

'date' formatted to datetime (better reading format).

Data filtered from the last 90 days.

Unnecessary columns with "NaN" data removed.
h
malware-text-db-securebert-ner-512
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naor Matania, malware-text-db-securebert-ner-512 [Dataset]. https://huggingface.co/datasets/naorm/malware-text-db-securebert-ner-512
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Naor Matania
Description
naorm/malware-text-db-securebert-ner-512 dataset hosted on Hugging Face and contributed by the HF Datasets community
Maldeb Dataset
kaggle.com
dataverse.telkomuniversity.ac.id
+1more
zip
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saquib Hussain (2024). Maldeb Dataset [Dataset]. https://www.kaggle.com/datasets/saquib7hussain/maldeb-dataset
Explore at:
zip(1577073922 bytes)Available download formats
Dataset updated
May 24, 2024
Authors
Saquib Hussain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Image representation of Malware-benign dataset. The Dataset were compiled from various sources malware repositories: The Malware-Repo, TheZoo,Malware Bazar, Malware Database, TekDefense. Meanwhile benign samples were sourced from system application of Microsoft 10 and 11, as well as open source software repository such as Sourceforge, PortableFreeware, CNET, FileForum. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples were pre-processed by transforming the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: https://vision.ece.ucsb.edu/research/signal-processing-malware-analysis. Malware and benign sample were collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023.
Malware Benign Image Classification Dataset
kaggle.com
zip
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BISHWAJIT PRASAD GOND (2025). Malware Benign Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/bishwajitprasadgond/malware-benign-image-sample/code
Explore at:
zip(353016286 bytes)Available download formats
Dataset updated
Apr 15, 2025
Authors
BISHWAJIT PRASAD GOND
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset utilized in this study comprises 22,056 samples, encompassing various categories of malware as well as benign software. This dataset originates from the research paper by Md Shahnawaz, Bishwajit Prasad Gond, and Durga Prasad Mohapatra titled “**Dynamic Malware Classification of Windows PE Files using CNNs and Greyscale Images Derived from Runtime API Call Argument Conversion**,” Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), held at the Indian Institute of Technology Indore on June 2, 2025. https://doi.org/10.48550/arXiv.2505.24231

Malware Dataset Description

The dataset, encompassing a total of 22,056 samples, includes various malware types and benign samples, each identified with a naming convention of Name_Hash. Here, Name denotes the malware family type, and Hash represents the hash of the Portable Executable (PE) file sourced from VirusShare and VirusTotal. For example, a sample might be named adware_0013996b0815f1b53ec52a46d0279b0d.png. The dataset consists of the following:

Adware: 1,986 samples

Backdoor: 674 samples

Downloader: 2,499 samples

Spyware: 946 samples

Trojan: 3,568 samples

Virus: 2,392 samples

Worms: 1,357 samples

Benign: 8,634 samples

Malware Analysis Workflow

1. Acquisition of Malware Hash

The initial phase involves obtaining the malware hash from the VirusShare database¹. This hash serves as a unique identifier for subsequent analysis.

2. Querying VirusTotal for Antivirus Scan Results

Using the acquired hash, a query is submitted to the VirusTotal platform² to retrieve a JSON file encapsulating results from over 70 distinct antivirus scans. These results are analyzed to classify the malware into its respective category.

3. Malware Category Download

Post-classification, malware samples from various categories are systematically downloaded for further examination.

4. Dynamic Analysis in Cuckoo Sandbox

Each malware category undergoes dynamic analysis within a controlled environment, specifically the Cuckoo Sandbox³. This process monitors the malware's behavioral patterns during execution.

5. Extraction of API Call Sequence

From the Portable Executable (PE) files’ behavioral data, provided as JSON output by the Cuckoo Sandbox, an API call sequence report is extracted in JSON format. This report is subsequently segmented into four distinct text files: - API Name - API Argument - API Return Value - API Category

6. Application of n-Gram Analysis

The segmented data is subjected to $n$-gram analysis, integrating API names with their corresponding arguments. Unique n-grams are derived across all malware categories, followed by the computation of Term Frequency (TF) metrics to quantify their significance. Finally, we convert this feature vector i.e https://www.kaggle.com/datasets/bishwajitprasadgond/malware-benign-api-call-argument-feature-vector into images.

Feature Transformation Phase

This phase focuses on converting the structured CSV data into images suitable for training with CNN.

a) Normalization and Reshaping

All numeric API values in each row were normalized into a range of 0–255.
The resulting vectors were reshaped into square matrices (e.g., 128×128), generating grayscale image representations of API usage patterns for each malware.

b) Image Enhancement Techniques

To highlight important features and patterns in the image, the following techniques were applied:

Gaussian Blur – To reduce noise and smooth variations.

CLAHE (Contrast Limited Adaptive Histogram Equalization) – For improving local contrast.

Sobel Edge Detection – To emphasize edge transitions and structural boundaries.

c) Color Mapping and Saving

A magma colormap was applied to the grayscale images, adding color richness for enhanced feature representation.
Finally, contrast and sharpness improvements were applied before saving the images to disk for training.

7. Conclusion

The outlined methodology facilitates a comprehensive analysis of malware behavior, enabling the derivation of actionable insights for further investigation and mitigation strategies.

VirusShare ↩

VirusTotal ↩

Cuckoo Sandbox ↩
h
malware-text-db-cyner
huggingface.co
Updated Dec 15, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naor Matania (2011). malware-text-db-cyner [Dataset]. https://huggingface.co/datasets/naorm/malware-text-db-cyner
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2011
Authors
Naor Matania
Description
naorm/malware-text-db-cyner dataset hosted on Hugging Face and contributed by the HF Datasets community
b
Complete Antivirus Database
comodo.com
cav
Updated Dec 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Comodo (2015). Complete Antivirus Database [Dataset]. https://www.comodo.com/home/internet-security/updates/vdp/database.php
Explore at:
cavAvailable download formats
Dataset updated
Dec 8, 2015
Dataset authored and provided by
Comodo
License
https://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php
Description
The complete Comodo Internet Security database is available for download...
Portable Executable Malware Data
kaggle.com
zip
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
malwareTBugs (2025). Portable Executable Malware Data [Dataset]. https://www.kaggle.com/datasets/malwaretbugs/maldata
Explore at:
zip(23094201 bytes)Available download formats
Dataset updated
Mar 10, 2025
Authors
malwareTBugs
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by malwareTBugs

Released under Database: Open Database, Contents: Database Contents

Contents
Quttera Website Malware Threat Encyclopedia
threats.quttera.com
json
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quttera (2025). Quttera Website Malware Threat Encyclopedia [Dataset]. https://threats.quttera.com/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 21, 2025
Dataset authored and provided by
Quttera
Time period covered
2024 - Present
Description
Comprehensive database of website malware threats, vulnerabilities, and security risks detected by Quttera's malware scanner.
S
AI-powered malware simulation of a medical imaging database
scidb.cn
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Somaya_haiba (2025). AI-powered malware simulation of a medical imaging database [Dataset]. http://doi.org/10.57760/sciencedb.27227
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.27227
Dataset updated
Sep 2, 2025
Dataset provided by
Science Data Bank
Authors
Somaya_haiba
Description
The dataset comprises medical imaging data that demonstrate the presence or absence of illnesses. used to simulate AI-based malware modulation, this database is paired with malware-modulated counterparts. By creating tampered images on the fly from the benign dataset using three mechanisms:Adversarial perturbations to input data that can cause data misclassification.Patch-level content edits by Copying-pasting or inpainting of small square regions (8–32 px) to simulate lesion insertion or removal.Metadata-consistent rescaling for random resize and crop variance. Each training batch is a duplicate of the original images.
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
+1more
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha; Niemelä, Mikko; Brumley, Billy (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Cyber Intelligence Househttps://cyberintelligencehouse.com/
Tampere University
Authors
Nurmi, Juha; Niemelä, Mikko; Brumley, Billy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
S
benign and injected IoMT packet database
scidb.cn
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Somaya_haiba (2025). benign and injected IoMT packet database [Dataset]. http://doi.org/10.57760/sciencedb.23587
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.23587
Dataset updated
Apr 14, 2025
Dataset provided by
Science Data Bank
Authors
Somaya_haiba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was compiled over a year and a half from various websites and sources, and it contains 7449 benign and malicious IoMT packets presented by real-world components of the e-healthcare system that monitor network transmission. Data quality is improved at several preprocessing stages, including dealing with noises and unwanted values as strings, cleaning, encoding string features, and rescaling all disordered data values using data transformation functions. To standardize the analysis of network features, we only consider features related to networking characteristics and reject all other features that provide insights into the patient's vital signs. This data set is for analyzing the IoMT traffic behavior within the smart hospital's networks.
m
Android Malware and Normal permissions dataset
data.mendeley.com
impactcybertrust.org
Updated Mar 13, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2018). Android Malware and Normal permissions dataset [Dataset]. http://doi.org/10.17632/958wvr38gy.1
Explore at:
Unique identifier
https://doi.org/10.17632/958wvr38gy.1
Dataset updated
Mar 13, 2018
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.
Malimg (Original)
kaggle.com
zip
Updated Mar 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ikram Ben abdel ouahab (2022). Malimg (Original) [Dataset]. https://www.kaggle.com/datasets/ikrambenabd/malimg-original
Explore at:
zip(1175647546 bytes)Available download formats
Dataset updated
Mar 16, 2022
Authors
Ikram Ben abdel ouahab
Description
Dataset

This dataset was created by Ikram Ben abdel ouahab

Contents
m
CTI and APT Related Dataset and Source Code for the Paper in Short: DEVIL
data.mendeley.com
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burak Gulbay (2024). CTI and APT Related Dataset and Source Code for the Paper in Short: DEVIL [Dataset]. http://doi.org/10.17632/rxr4rr9bw3.2
Explore at:
Unique identifier
https://doi.org/10.17632/rxr4rr9bw3.2
Dataset updated
Jul 15, 2024
Authors
Burak Gulbay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here are the data set and source code related to the paper: "DEVIL: A Framework for Discovering and Evaluating Insidious Advanced Persistent Threats Leveraging Graph-Based Algorithms"

1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors

3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

4- countries.zip : contains country name list.

5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/

6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families

7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher

8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip

9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.

10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.

11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:

docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0

web interface: http://localhost:7474 username: neo4j password: neo4j
m
ETF IoT Botnet Dataset
data.mendeley.com
Updated Jan 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Đorđe Jovanović (2021). ETF IoT Botnet Dataset [Dataset]. http://doi.org/10.17632/nbs66kvx6n.1
Explore at:
Unique identifier
https://doi.org/10.17632/nbs66kvx6n.1
Dataset updated
Jan 26, 2021
Authors
Đorđe Jovanović
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data in this dataset represent recorded network traffic of specimens of IoT malware samples that were collected from the links found on URLHaus database website (malware.zip), in the period from 2019 to 2021, at the University of Belgrade, School of Electric Engineering. These malware samples were run on RaspberryPi devices, with restricted local network access, and the network traffic was recorded using tcpdump tool. The benign network traffic (benign.zip) represents all the network traffic recorded on a personal computer for the duration of several hours, split into two files. All local network addresses were anonymized in the process of making these pcap files. The csv file in the dataset contains description for each of the malware pcap files, consisting of: file name, UrlHaus URL, bot address, malware address, attack presence, attacked address, URLHaus tags, collection date (in the DD/MM/YYYY format), and comment.
Z
Dataset and Source Code for the Paper: A Framework for Developing Strategic...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+2more
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gulbay, BURAK (2024). Dataset and Source Code for the Paper: A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12741054
Explore at:
Dataset updated
Jul 14, 2024
Dataset provided by
Gazi University
Authors
Gulbay, BURAK
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here are the data set and source code related to the paper: "A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms"

1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors and https://malpedia.caad.fkie.fraunhofer.de/actors

3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

4- countries.zip : contains country name list.

5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/

6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families

7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher

8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip

9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.

10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.

11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:

docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0

web interface: http://localhost:7474

username: neo4j

password: neo4j
Kraken2 Metagenomic Virus Database
osti.gov
Updated Apr 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21) (2020). Kraken2 Metagenomic Virus Database [Dataset]. http://doi.org/10.13139/OLCF/1615774
Explore at:
Unique identifier
https://doi.org/10.13139/OLCF/1615774
Dataset updated
Apr 23, 2020
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Department of Energy Biological and Environmental Research Program
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Description
The Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).

Small Business Cybersecurity 2020-2021 Checklist

data.mendeley.com

Updated Sep 12, 2020

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

lissa coffey (2020). Small Business Cybersecurity 2020-2021 Checklist [Dataset]. http://doi.org/10.17632/gk9t7zs5hz.1

Explore at:

Unique identifier

https://doi.org/10.17632/gk9t7zs5hz.1

Dataset updated

Sep 12, 2020

Authors

lissa coffey

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Cyber attacks are a growing concern for small businesses during COVID-19 . Be Protected While You Work. Upgrade Your Small Business's Virus Protection Today! Before going for a Cyber security solutions for small to mid-sized businesses deliver enterprise-level protection.

Download this (Checklist for a Small Firm's Cybersecurity Program 2020-2021) data set to deploy secure functioning of various aspects of your small business including, employee data, website and more.This checklist is provided to
assist small member firms with limited resources to establish a cybersecurity program to identify and assess cybersecurity threats,
protect assets from cyber intrusions,
detect when their systems and assets have been compromised,
plan for the response when a compromise occurs and implement a plan to recover lost, stolen or unavailable assets. 
Train employees in security principles.
Protect information, computers, and networks from malware attacks.
Provide firewall security for your Internet connection.
Create a mobile device action plan.
 Make backup copies of important business data and information.
 Learn about the threats and how to protect your website.
 Protect Your Small Business site.
 Learn the basics for protecting your business web sites from cyber attacks at WP Hacked Help Blog

Created With Inputs From Security Experts at WP Hacked Help - Pioneer In WordPress Malware Removal & Security

r
RNA Virus Database
rrid.site
dknet.org
+2more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RNA Virus Database [Dataset]. http://identifiers.org/RRID:SCR_007899
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007899
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 19, 2016. It is a database and web application describing the genome organization and providing analytical tools for the 938 known species of RNA virus. It can identify submitted nucleotide sequences, can place them into multiple whole-genome alignments (in species where more than one isolate has been fully sequenced) and contains translated genome sequences for all species. It has been created for two main purposes: to facilitate the comparative analysis of RNA viruses and to become a hub for other, more specialised virus Web sites.
b
Virus-HostDB
bioregistry.io
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Virus-HostDB [Dataset]. https://bioregistry.io/virushostdb
Explore at:
Dataset updated
Jan 19, 2023
Description
Virus-Host DB organizes data about the relationships between viruses and their hosts, represented in the form of pairs of NCBI taxonomy IDs for viruses and their hosts. Virus-Host DB covers viruses with complete genomes stored in 1) NCBI/RefSeq and 2) GenBank whose accession numbers are listed in EBI Genomes. The host information is collected from RefSeq, GenBank (in free text format), UniProt, ViralZone, and manually curated with additional information obtained by literature surveys.

Facebook

Twitter

Click to copy link

Link copied

Cite

José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025

MalwareBazaar Malware Dataset (Sep - Oct 2025)

a dataset of uploaded malware in MalwareBazaar's database.

Explore at:

zip(9415213 bytes)Available download formats

Dataset updated

Oct 9, 2025

Authors

José Reyes

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MalwareBazaar Malware Dataset.

Introduction.

This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

Description.

The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

Metadata removed (footer with unreadable information).
'date' formatted to datetime (better reading format).
Data filtered from the last 90 days.
Unnecessary columns with "NaN" data removed.

Clear search

Close search

Google apps

Main menu

MalwareBazaar Malware Dataset (Sep - Oct 2025)

MalwareBazaar Malware Dataset.

Introduction.

Description.

malware-text-db-securebert-ner-512

Maldeb Dataset

Malware Benign Image Classification Dataset

Malware Dataset Description

Malware Analysis Workflow

1. Acquisition of Malware Hash

2. Querying VirusTotal for Antivirus Scan Results

3. Malware Category Download

4. Dynamic Analysis in Cuckoo Sandbox

5. Extraction of API Call Sequence

6. Application of n-Gram Analysis

Feature Transformation Phase

a) Normalization and Reshaping

b) Image Enhancement Techniques

c) Color Mapping and Saving

7. Conclusion

malware-text-db-cyner

Complete Antivirus Database

Portable Executable Malware Data

Dataset

Contents

Quttera Website Malware Threat Encyclopedia

AI-powered malware simulation of a medical imaging database

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

benign and injected IoMT packet database

Android Malware and Normal permissions dataset

Malimg (Original)

Dataset

Contents

CTI and APT Related Dataset and Source Code for the Paper in Short: DEVIL

ETF IoT Botnet Dataset

Dataset and Source Code for the Paper: A Framework for Developing Strategic...

Kraken2 Metagenomic Virus Database

Small Business Cybersecurity 2020-2021 Checklist

RNA Virus Database

Virus-HostDB

MalwareBazaar Malware Dataset (Sep - Oct 2025)

a dataset of uploaded malware in MalwareBazaar's database.

MalwareBazaar Malware Dataset.

Introduction.

Description.