89 datasets found
  1. MalwareBazaar Malware Dataset (Sep - Oct 2025)

    • kaggle.com
    zip
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025
    Explore at:
    zip(9415213 bytes)Available download formats
    Dataset updated
    Oct 9, 2025
    Authors
    José Reyes
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MalwareBazaar Malware Dataset.

    Introduction.

    This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

    Description.

    The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

    • Metadata removed (footer with unreadable information).
    • 'date' formatted to datetime (better reading format).
    • Data filtered from the last 90 days.
    • Unnecessary columns with "NaN" data removed.
  2. h

    malware-text-db-securebert-ner-512

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naor Matania, malware-text-db-securebert-ner-512 [Dataset]. https://huggingface.co/datasets/naorm/malware-text-db-securebert-ner-512
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Naor Matania
    Description

    naorm/malware-text-db-securebert-ner-512 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. Maldeb Dataset

    • kaggle.com
    • dataverse.telkomuniversity.ac.id
    • +1more
    zip
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saquib Hussain (2024). Maldeb Dataset [Dataset]. https://www.kaggle.com/datasets/saquib7hussain/maldeb-dataset
    Explore at:
    zip(1577073922 bytes)Available download formats
    Dataset updated
    May 24, 2024
    Authors
    Saquib Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Image representation of Malware-benign dataset. The Dataset were compiled from various sources malware repositories: The Malware-Repo, TheZoo,Malware Bazar, Malware Database, TekDefense. Meanwhile benign samples were sourced from system application of Microsoft 10 and 11, as well as open source software repository such as Sourceforge, PortableFreeware, CNET, FileForum. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples were pre-processed by transforming the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: https://vision.ece.ucsb.edu/research/signal-processing-malware-analysis. Malware and benign sample were collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023.

  4. Malware Benign Image Classification Dataset

    • kaggle.com
    zip
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BISHWAJIT PRASAD GOND (2025). Malware Benign Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/bishwajitprasadgond/malware-benign-image-sample/code
    Explore at:
    zip(353016286 bytes)Available download formats
    Dataset updated
    Apr 15, 2025
    Authors
    BISHWAJIT PRASAD GOND
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset utilized in this study comprises 22,056 samples, encompassing various categories of malware as well as benign software. This dataset originates from the research paper by Md Shahnawaz, Bishwajit Prasad Gond, and Durga Prasad Mohapatra titled “**Dynamic Malware Classification of Windows PE Files using CNNs and Greyscale Images Derived from Runtime API Call Argument Conversion**,” Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), held at the Indian Institute of Technology Indore on June 2, 2025. https://doi.org/10.48550/arXiv.2505.24231

    Malware Dataset Description

    The dataset, encompassing a total of 22,056 samples, includes various malware types and benign samples, each identified with a naming convention of Name_Hash. Here, Name denotes the malware family type, and Hash represents the hash of the Portable Executable (PE) file sourced from VirusShare and VirusTotal. For example, a sample might be named adware_0013996b0815f1b53ec52a46d0279b0d.png. The dataset consists of the following:

    • Adware: 1,986 samples
    • Backdoor: 674 samples
    • Downloader: 2,499 samples
    • Spyware: 946 samples
    • Trojan: 3,568 samples
    • Virus: 2,392 samples
    • Worms: 1,357 samples
    • Benign: 8,634 samples

    Malware Analysis Workflow

    1. Acquisition of Malware Hash

    The initial phase involves obtaining the malware hash from the VirusShare database1. This hash serves as a unique identifier for subsequent analysis.

    2. Querying VirusTotal for Antivirus Scan Results

    Using the acquired hash, a query is submitted to the VirusTotal platform2 to retrieve a JSON file encapsulating results from over 70 distinct antivirus scans. These results are analyzed to classify the malware into its respective category.

    3. Malware Category Download

    Post-classification, malware samples from various categories are systematically downloaded for further examination.

    4. Dynamic Analysis in Cuckoo Sandbox

    Each malware category undergoes dynamic analysis within a controlled environment, specifically the Cuckoo Sandbox3. This process monitors the malware's behavioral patterns during execution.

    5. Extraction of API Call Sequence

    From the Portable Executable (PE) files’ behavioral data, provided as JSON output by the Cuckoo Sandbox, an API call sequence report is extracted in JSON format. This report is subsequently segmented into four distinct text files: - API Name - API Argument - API Return Value - API Category

    6. Application of n-Gram Analysis

    The segmented data is subjected to $n$-gram analysis, integrating API names with their corresponding arguments. Unique n-grams are derived across all malware categories, followed by the computation of Term Frequency (TF) metrics to quantify their significance. Finally, we convert this feature vector i.e https://www.kaggle.com/datasets/bishwajitprasadgond/malware-benign-api-call-argument-feature-vector into images.

    Feature Transformation Phase

    This phase focuses on converting the structured CSV data into images suitable for training with CNN.

    a) Normalization and Reshaping

    All numeric API values in each row were normalized into a range of 0–255.
    The resulting vectors were reshaped into square matrices (e.g., 128×128), generating grayscale image representations of API usage patterns for each malware.

    b) Image Enhancement Techniques

    To highlight important features and patterns in the image, the following techniques were applied:

    • Gaussian Blur – To reduce noise and smooth variations.
    • CLAHE (Contrast Limited Adaptive Histogram Equalization) – For improving local contrast.
    • Sobel Edge Detection – To emphasize edge transitions and structural boundaries.

    c) Color Mapping and Saving

    A magma colormap was applied to the grayscale images, adding color richness for enhanced feature representation.
    Finally, contrast and sharpness improvements were applied before saving the images to disk for training.

    7. Conclusion

    The outlined methodology facilitates a comprehensive analysis of malware behavior, enabling the derivation of actionable insights for further investigation and mitigation strategies.

  5. h

    malware-text-db-cyner

    • huggingface.co
    Updated Dec 15, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naor Matania (2011). malware-text-db-cyner [Dataset]. https://huggingface.co/datasets/naorm/malware-text-db-cyner
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2011
    Authors
    Naor Matania
    Description

    naorm/malware-text-db-cyner dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. b

    Complete Antivirus Database

    • comodo.com
    cav
    Updated Dec 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Comodo (2015). Complete Antivirus Database [Dataset]. https://www.comodo.com/home/internet-security/updates/vdp/database.php
    Explore at:
    cavAvailable download formats
    Dataset updated
    Dec 8, 2015
    Dataset authored and provided by
    Comodo
    License

    https://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php

    Description

    The complete Comodo Internet Security database is available for download...

  7. Portable Executable Malware Data

    • kaggle.com
    zip
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    malwareTBugs (2025). Portable Executable Malware Data [Dataset]. https://www.kaggle.com/datasets/malwaretbugs/maldata
    Explore at:
    zip(23094201 bytes)Available download formats
    Dataset updated
    Mar 10, 2025
    Authors
    malwareTBugs
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by malwareTBugs

    Released under Database: Open Database, Contents: Database Contents

    Contents

  8. Quttera Website Malware Threat Encyclopedia

    • threats.quttera.com
    json
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quttera (2025). Quttera Website Malware Threat Encyclopedia [Dataset]. https://threats.quttera.com/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    Quttera
    Time period covered
    2024 - Present
    Description

    Comprehensive database of website malware threats, vulnerabilities, and security risks detected by Quttera's malware scanner.

  9. S

    AI-powered malware simulation of a medical imaging database

    • scidb.cn
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somaya_haiba (2025). AI-powered malware simulation of a medical imaging database [Dataset]. http://doi.org/10.57760/sciencedb.27227
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Somaya_haiba
    Description

    The dataset comprises medical imaging data that demonstrate the presence or absence of illnesses. used to simulate AI-based malware modulation, this database is paired with malware-modulated counterparts. By creating tampered images on the fly from the benign dataset using three mechanisms:Adversarial perturbations to input data that can cause data misclassification.Patch-level content edits by Copying-pasting or inpainting of small square regions (8–32 px) to simulate lesion insertion or removal.Metadata-consistent rescaling for random resize and crop variance. Each training batch is a duplicate of the original images.

  10. Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Cyber Intelligence Househttps://cyberintelligencehouse.com/
    Tampere University
    Authors
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

    Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

    We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

    1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

    2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

    3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

    Credits Authors

    Billy Bob Brumley (Tampere University, Tampere, Finland)

    Juha Nurmi (Tampere University, Tampere, Finland)

    Mikko Niemelä (Cyber Intelligence House, Singapore)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

    Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  11. S

    benign and injected IoMT packet database

    • scidb.cn
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somaya_haiba (2025). benign and injected IoMT packet database [Dataset]. http://doi.org/10.57760/sciencedb.23587
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Somaya_haiba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was compiled over a year and a half from various websites and sources, and it contains 7449 benign and malicious IoMT packets presented by real-world components of the e-healthcare system that monitor network transmission. Data quality is improved at several preprocessing stages, including dealing with noises and unwanted values as strings, cleaning, encoding string features, and rescaling all disordered data values using data transformation functions. To standardize the analysis of network features, we only consider features related to networking characteristics and reject all other features that provide insights into the patient's vital signs. This data set is for analyzing the IoMT traffic behavior within the smart hospital's networks.

  12. m

    Android Malware and Normal permissions dataset

    • data.mendeley.com
    • impactcybertrust.org
    Updated Mar 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2018). Android Malware and Normal permissions dataset [Dataset]. http://doi.org/10.17632/958wvr38gy.1
    Explore at:
    Dataset updated
    Mar 13, 2018
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.

  13. Malimg (Original)

    • kaggle.com
    zip
    Updated Mar 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ikram Ben abdel ouahab (2022). Malimg (Original) [Dataset]. https://www.kaggle.com/datasets/ikrambenabd/malimg-original
    Explore at:
    zip(1175647546 bytes)Available download formats
    Dataset updated
    Mar 16, 2022
    Authors
    Ikram Ben abdel ouahab
    Description

    Dataset

    This dataset was created by Ikram Ben abdel ouahab

    Contents

  14. m

    CTI and APT Related Dataset and Source Code for the Paper in Short: DEVIL

    • data.mendeley.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burak Gulbay (2024). CTI and APT Related Dataset and Source Code for the Paper in Short: DEVIL [Dataset]. http://doi.org/10.17632/rxr4rr9bw3.2
    Explore at:
    Dataset updated
    Jul 15, 2024
    Authors
    Burak Gulbay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here are the data set and source code related to the paper: "DEVIL: A Framework for Discovering and Evaluating Insidious Advanced Persistent Threats Leveraging Graph-Based Algorithms"

    1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

    2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors

    3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

    4- countries.zip : contains country name list.

    5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/

    6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families

    7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher

    8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip

    9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.

    10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.

    11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:

    docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0

    web interface: http://localhost:7474 username: neo4j password: neo4j

  15. m

    ETF IoT Botnet Dataset

    • data.mendeley.com
    Updated Jan 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Đorđe Jovanović (2021). ETF IoT Botnet Dataset [Dataset]. http://doi.org/10.17632/nbs66kvx6n.1
    Explore at:
    Dataset updated
    Jan 26, 2021
    Authors
    Đorđe Jovanović
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data in this dataset represent recorded network traffic of specimens of IoT malware samples that were collected from the links found on URLHaus database website (malware.zip), in the period from 2019 to 2021, at the University of Belgrade, School of Electric Engineering. These malware samples were run on RaspberryPi devices, with restricted local network access, and the network traffic was recorded using tcpdump tool. The benign network traffic (benign.zip) represents all the network traffic recorded on a personal computer for the duration of several hours, split into two files. All local network addresses were anonymized in the process of making these pcap files. The csv file in the dataset contains description for each of the malware pcap files, consisting of: file name, UrlHaus URL, bot address, malware address, attack presence, attacked address, URLHaus tags, collection date (in the DD/MM/YYYY format), and comment.

  16. Z

    Dataset and Source Code for the Paper: A Framework for Developing Strategic...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +2more
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gulbay, BURAK (2024). Dataset and Source Code for the Paper: A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12741054
    Explore at:
    Dataset updated
    Jul 14, 2024
    Dataset provided by
    Gazi University
    Authors
    Gulbay, BURAK
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here are the data set and source code related to the paper: "A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms"

    1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

    2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors and https://malpedia.caad.fkie.fraunhofer.de/actors

    3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

    4- countries.zip : contains country name list.

    5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/

    6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families

    7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher

    8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip

    9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.

    10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.

    11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:

    docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0

    web interface: http://localhost:7474

    username: neo4j

    password: neo4j

  17. Kraken2 Metagenomic Virus Database

    • osti.gov
    Updated Apr 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21) (2020). Kraken2 Metagenomic Virus Database [Dataset]. http://doi.org/10.13139/OLCF/1615774
    Explore at:
    Dataset updated
    Apr 23, 2020
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Department of Energy Biological and Environmental Research Program
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
    Description

    The Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).

  18. m

    Small Business Cybersecurity 2020-2021 Checklist

    • data.mendeley.com
    Updated Sep 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lissa coffey (2020). Small Business Cybersecurity 2020-2021 Checklist [Dataset]. http://doi.org/10.17632/gk9t7zs5hz.1
    Explore at:
    Dataset updated
    Sep 12, 2020
    Authors
    lissa coffey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cyber attacks are a growing concern for small businesses during COVID-19 . Be Protected While You Work. Upgrade Your Small Business's Virus Protection Today! Before going for a Cyber security solutions for small to mid-sized businesses deliver enterprise-level protection.

    Download this (Checklist for a Small Firm's Cybersecurity Program 2020-2021) data set to deploy secure functioning of various aspects of your small business including, employee data, website and more.This checklist is provided to
    assist small member firms with limited resources to establish a cybersecurity program to identify and assess cybersecurity threats,
    protect assets from cyber intrusions,
    detect when their systems and assets have been compromised,
    plan for the response when a compromise occurs and implement a plan to recover lost, stolen or unavailable assets. 
    Train employees in security principles.
    Protect information, computers, and networks from malware attacks.
    Provide firewall security for your Internet connection.
    Create a mobile device action plan.
     Make backup copies of important business data and information.
     Learn about the threats and how to protect your website.
     Protect Your Small Business site.
     Learn the basics for protecting your business web sites from cyber attacks at WP Hacked Help Blog
    

    Created With Inputs From Security Experts at WP Hacked Help - Pioneer In WordPress Malware Removal & Security

  19. r

    RNA Virus Database

    • rrid.site
    • dknet.org
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RNA Virus Database [Dataset]. http://identifiers.org/RRID:SCR_007899
    Explore at:
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 19, 2016. It is a database and web application describing the genome organization and providing analytical tools for the 938 known species of RNA virus. It can identify submitted nucleotide sequences, can place them into multiple whole-genome alignments (in species where more than one isolate has been fully sequenced) and contains translated genome sequences for all species. It has been created for two main purposes: to facilitate the comparative analysis of RNA viruses and to become a hub for other, more specialised virus Web sites.

  20. b

    Virus-HostDB

    • bioregistry.io
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Virus-HostDB [Dataset]. https://bioregistry.io/virushostdb
    Explore at:
    Dataset updated
    Jan 19, 2023
    Description

    Virus-Host DB organizes data about the relationships between viruses and their hosts, represented in the form of pairs of NCBI taxonomy IDs for viruses and their hosts. Virus-Host DB covers viruses with complete genomes stored in 1) NCBI/RefSeq and 2) GenBank whose accession numbers are listed in EBI Genomes. The host information is collected from RefSeq, GenBank (in free text format), UniProt, ViralZone, and manually curated with additional information obtained by literature surveys.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025
Organization logo

MalwareBazaar Malware Dataset (Sep - Oct 2025)

a dataset of uploaded malware in MalwareBazaar's database.

Explore at:
zip(9415213 bytes)Available download formats
Dataset updated
Oct 9, 2025
Authors
José Reyes
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MalwareBazaar Malware Dataset.

Introduction.

This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

Description.

The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

  • Metadata removed (footer with unreadable information).
  • 'date' formatted to datetime (better reading format).
  • Data filtered from the last 90 days.
  • Unnecessary columns with "NaN" data removed.
Search
Clear search
Close search
Google apps
Main menu