100+ datasets found
  1. MalwareBazaar Malware Dataset (Sep - Oct 2025)

    • kaggle.com
    zip
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025
    Explore at:
    zip(9415213 bytes)Available download formats
    Dataset updated
    Oct 9, 2025
    Authors
    José Reyes
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MalwareBazaar Malware Dataset.

    Introduction.

    This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

    Description.

    The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

    • Metadata removed (footer with unreadable information).
    • 'date' formatted to datetime (better reading format).
    • Data filtered from the last 90 days.
    • Unnecessary columns with "NaN" data removed.
  2. b

    Complete Antivirus Database

    • comodo.com
    cav
    Updated Dec 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Comodo (2015). Complete Antivirus Database [Dataset]. https://www.comodo.com/home/internet-security/updates/vdp/database.php
    Explore at:
    cavAvailable download formats
    Dataset updated
    Dec 8, 2015
    Dataset authored and provided by
    Comodo
    License

    https://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php

    Description

    The complete Comodo Internet Security database is available for download...

  3. Android Malware Detection Dataset

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danny Revaldo (2024). Android Malware Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dannyrevaldo/android-malware-detection-dataset
    Explore at:
    zip(123470 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    Danny Revaldo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The "Android Malware Detection Dataset" is a comprehensive collection of data designed to facilitate research in the detection and analysis of malware targeting the Android platform. This dataset encompasses a wide range of features extracted from Android applications, providing valuable insights into their behaviors and functionalities.

    Key features of the dataset include:

    • Permission Features: Various permissions requested by Android applications, such as access to location (coarse and fine), camera, microphone, contacts, SMS, calendar, storage, and more.
    • System Features: Features related to system functions and controls, including access to device hardware (e.g., sensors, Bluetooth, NFC), system settings (e.g., changing network state, WiFi settings), and system services (e.g., managing accounts, managing documents).
    • Security-related Features: Features related to security functionalities and behaviors, encompassing permission management, authentication, encryption (e.g., cryptographic operations), and security policy enforcement.
    • Communication Features: Features related to communication functionalities, including sending and receiving SMS messages, making phone calls, accessing network state, and managing network connections.
    • Data Access Features: Features related to accessing and manipulating data, such as reading and writing to various data sources (e.g., external storage, databases), accessing user information (e.g., contacts, call logs), and accessing app-specific data.
    • App Lifecycle Features: Features related to managing the application lifecycle, including app installation and uninstallation, app startup and shutdown, app updates, and app permissions.
    • Device Control Features: Features related to controlling device behavior and settings, such as changing system settings, modifying audio settings, controlling device display, and managing device power.
    • Miscellaneous Features: Other miscellaneous features including accessing system logs, system services and components (e.g., camera, location manager), handling system events (e.g., incoming calls, boot completed), and interacting with system UI components.

    This dataset provides researchers with a rich source of information to develop and evaluate effective malware detection and analysis techniques, ultimately contributing to the enhancement of mobile security on the Android platform.

  4. Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • data.europa.eu
    • data.niaid.nih.gov
    • +1more
    unknown
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8047205?locale=bg
    Explore at:
    unknown(8866943)Available download formats
    Dataset updated
    Oct 18, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS. Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts. We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License. 1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB. 2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB. 3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB. Credits Authors Billy Bob Brumley (Tampere University, Tampere, Finland) Juha Nurmi (Tampere University, Tampere, Finland) Mikko Niemelä (Cyber Intelligence House, Singapore) Funding This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS). Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  5. T

    Maldeb Dataset

    • dataverse.telkomuniversity.ac.id
    • ieee-dataport.org
    • +1more
    png
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Telkom University Dataverse (2024). Maldeb Dataset [Dataset]. http://doi.org/10.34820/FK2/HQYV4X
    Explore at:
    png(37009), png(40485), png(17688), png(34844), png(9493), png(29711), png(20558), png(28684), png(29803), png(6311), png(40949), png(40392), png(38400), png(4038), png(5275), png(17960), png(38508), png(37266), png(31778), png(40248), png(28914), png(38992), png(40895), png(7485), png(28915), png(17724), png(25025), png(38142), png(27095), png(26777), png(37000), png(33749), png(12823), png(16016), png(12597), png(14025), png(7385), png(42604), png(26334), png(27060), png(19233), png(28916), png(12160), png(31488), png(3872), png(36959), png(16928), png(3667), png(32525), png(18253), png(29577), png(40024), png(39597), png(39050), png(11090), png(9764), png(41011), png(39924), png(31149), png(4693), png(39079), png(36808), png(2226), png(38297), png(32701), png(7143), png(5541), png(31606), png(39359), png(11048), png(32711), png(12788), png(26224), png(38202), png(36818), png(20676), png(9677), png(41423), png(24325), png(30595), png(36543), png(7767), png(36066), png(37337), png(33854), png(28742), png(24158), png(42716), png(14727), png(41822), png(27177), png(31238), png(42792), png(34881), png(38036), png(37751), png(14483), png(24093), png(13037), png(42313), png(23072), png(15264), png(19868), png(30260), png(38010), png(30017), png(34029), png(19782), png(41975), png(3367), png(12188), png(32190), png(42775), png(2606), png(41390), png(34637), png(38167), png(10958), png(9704), png(40913), png(42849), png(6512), png(12577), png(30133), png(40975), png(42692), png(13627), png(29584), png(10867), png(10814), png(18784), png(27712), png(11945), png(3054), png(42333), png(27827), png(8053), png(24375), png(31575), png(33487), png(13038)Available download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Telkom University Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Directorate General of Higher Education, Ministry of Education and Culture Republic of Indonesia
    Japanese Student Service Association (JASSO)
    Description

    Malware-benign Image representation. The Dataset were collected from several malware repositories, including TekDefense, TheZoo, The Malware-Repo, Malware Database amd Malware Bazar. The benign samples were collected from Microsoft 10 and 11 system apps and several open source software repository including CNET, Sourceforge, FileForum, PortableFreeware. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples underwent preprocessing by converting the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: https://vision.ece.ucsb.edu/research/signal-processing-malware-analysis. Maldeb Dataset is collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023. The complete dataset can be accessed on this link https://ieee-dataport.org/documents/maldeb-dataset and https://github.com/julismail/Self-Supervised

  6. Portable Executable Malware Data

    • kaggle.com
    zip
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    malwareTBugs (2025). Portable Executable Malware Data [Dataset]. https://www.kaggle.com/datasets/malwaretbugs/maldata
    Explore at:
    zip(23094201 bytes)Available download formats
    Dataset updated
    Mar 10, 2025
    Authors
    malwareTBugs
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by malwareTBugs

    Released under Database: Open Database, Contents: Database Contents

    Contents

  7. i

    Malware Analysis Datasets: Top-1000 PE Imports

    • ieee-dataport.org
    Updated Nov 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelo Oliveira (2019). Malware Analysis Datasets: Top-1000 PE Imports [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-top-1000-pe-imports
    Explore at:
    Dataset updated
    Nov 8, 2019
    Authors
    Angelo Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

  8. Quttera Website Malware Threat Encyclopedia

    • threats.quttera.com
    json
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quttera (2025). Quttera Website Malware Threat Encyclopedia [Dataset]. https://threats.quttera.com/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    Quttera
    Time period covered
    2024 - Present
    Description

    Comprehensive database of website malware threats, vulnerabilities, and security risks detected by Quttera's malware scanner.

  9. Kraken2 Metagenomic Virus Database

    • osti.gov
    Updated Apr 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21) (2020). Kraken2 Metagenomic Virus Database [Dataset]. http://doi.org/10.13139/OLCF/1615774
    Explore at:
    Dataset updated
    Apr 23, 2020
    Dataset provided by
    Department of Energy Biological and Environmental Research Program
    Office of Sciencehttp://www.er.doe.gov/
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
    Description

    The Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).

  10. Z

    Malware Repositories and Their Authors on GitHub

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tania, Nishat Ara; Masud, Md Rayhanul; Rokon, Md Omar Faruk; Zhang, Qian; Faloutsos, Michalis (2024). Malware Repositories and Their Authors on GitHub [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10806592
    Explore at:
    Dataset updated
    Mar 11, 2024
    Dataset provided by
    University of California, Riverside
    Walmart Global Tech
    Authors
    Tania, Nishat Ara; Masud, Md Rayhanul; Rokon, Md Omar Faruk; Zhang, Qian; Faloutsos, Michalis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is rooted in a study aimed at unveiling the origins and motivations behind the creation of malware repositories on GitHub. Our research embarks on an innovative journey to dissect the profiles and intentions of GitHub users who have been involved in this dubious activity.

    Employing a robust methodology, we meticulously identified 14,000 GitHub users linked to malware repositories. By leveraging advanced large language model (LLM) analytics, we classified these individuals into distinct categories based on their perceived intent: 3,339 were deemed Malicious, 3,354 Likely Malicious, and 7,574 Benign, offering a nuanced perspective on the community behind these repositories.

    Our analysis penetrates the veil of anonymity and obscurity often associated with these GitHub profiles, revealing stark contrasts in their characteristics. Malicious authors were found to typically possess sparse profiles focused on nefarious activities, while Benign authors presented well-rounded profiles, actively contributing to cybersecurity education and research. Those labeled as Likely Malicious exhibited a spectrum of engagement levels, underlining the complexity and diversity within this digital ecosystem.

    We are offering two datasets in this paper. First, a list of malware repositories - we have collected and extended the malware repositories on the GitHub in 2022 following the original papers. Second, a csv file with the github users information with their maliciousness classfication label.

    malware_repos.txt

    Purpose: This file contains a curated list of GitHub repositories identified as containing malware. These repositories were identified following the methodology outlined in the research paper "SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub."

    Contents: The file is structured as a simple text file, with each line representing a unique repository in the format username/reponame. This format allows for easy identification and access to each repository on GitHub for further analysis or review.

    Usage: The list serves as a critical resource for researchers and cybersecurity professionals interested in studying malware, understanding its distribution on platforms like GitHub, or developing defense mechanisms against such malicious content.

    obfuscated_github_user_dataset.csv

    Purpose: Accompanying the list of malware repositories, this CSV file contains detailed, albeit obfuscated, profile information of the GitHub users who authored these repositories. The obfuscation process has been applied to protect user privacy and comply with ethical standards, especially given the sensitive nature of associating individuals with potentially malicious activities.

    Contents: The dataset includes several columns representing different aspects of user profiles, such as obfuscated identifiers (e.g., ID, login, name), contact information (e.g., email, blog), and GitHub-specific metrics (e.g., followers count, number of public repositories). Notably, sensitive information has been masked or replaced with generic placeholders to prevent user identification.

    Usage: This dataset can be instrumental for researchers analyzing behaviors, patterns, or characteristics of users involved in creating malware repositories on GitHub. It provides a basis for statistical analysis, trend identification, or the development of predictive models, all while upholding the necessary ethical considerations.

  11. r

    RNA Virus Database

    • rrid.site
    • dknet.org
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RNA Virus Database [Dataset]. http://identifiers.org/RRID:SCR_007899
    Explore at:
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 19, 2016. It is a database and web application describing the genome organization and providing analytical tools for the 938 known species of RNA virus. It can identify submitted nucleotide sequences, can place them into multiple whole-genome alignments (in species where more than one isolate has been fully sequenced) and contains translated genome sequences for all species. It has been created for two main purposes: to facilitate the comparative analysis of RNA viruses and to become a hub for other, more specialised virus Web sites.

  12. S

    AI-powered malware simulation of a medical imaging database

    • scidb.cn
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somaya_haiba (2025). AI-powered malware simulation of a medical imaging database [Dataset]. http://doi.org/10.57760/sciencedb.27227
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Somaya_haiba
    Description

    The dataset comprises medical imaging data that demonstrate the presence or absence of illnesses. used to simulate AI-based malware modulation, this database is paired with malware-modulated counterparts. By creating tampered images on the fly from the benign dataset using three mechanisms:Adversarial perturbations to input data that can cause data misclassification.Patch-level content edits by Copying-pasting or inpainting of small square regions (8–32 px) to simulate lesion insertion or removal.Metadata-consistent rescaling for random resize and crop variance. Each training batch is a duplicate of the original images.

  13. n

    IVDB - Influenza Virus Database

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). IVDB - Influenza Virus Database [Dataset]. http://identifiers.org/RRID:SCR_013404
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    IVDB hosts complete genome sequences of influenza A virus generated by BGI and curates all other published influenza virus sequences after expert annotations. For the convenience of efficient data utilization, our Q-Filter system classifies and ranks all nucleotide sequences into 7 categories according to sequence content and integrity. IVDB provides a series of tools and viewers for analyzing the viral genomes, genes, genetic polymorphisms and phylogenetic relationships comparatively. A searching system is developed for users to retrieve a combination of different data types by setting various search options. To facilitate analysis of the global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) is developed to display worldwide geographic distribution of the viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment tools and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to the pre-computed alignments and polymorphism analysis of influenza virus genes and proteins and presents the results by SNP distribution plots and minor allele distributions. IVDB aims to be a powerful information resource and an analysis workbench for scientists working on IV genetics, evolution, diagnostics, vaccine development, and drug design.

  14. Malware Detection in Network Traffic Data

    • kaggle.com
    zip
    Updated Dec 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agung Pambudi (2023). Malware Detection in Network Traffic Data [Dataset]. https://www.kaggle.com/datasets/agungpambudi/network-malware-detection-connection-analysis
    Explore at:
    zip(755409206 bytes)Available download formats
    Dataset updated
    Dec 26, 2023
    Authors
    Agung Pambudi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To cite the dataset please reference it as “Stratosphere Laboratory. A labeled dataset with malicious and benign IoT network traffic. January 22th. Agustin Parmisano, Sebastian Garcia, Maria Jose Erquiaga. https://www.stratosphereips.org/datasets-iot23

    This dataset includes labels that explain the linkages between flows connected with harmful or possibly malicious activity to provide network malware researchers and analysts with more thorough information. These labels were painstakingly created at the Stratosphere labs using malware capture analysis.

    We present a concise explanation of the labels used for the identification of malicious flows, based on manual network analysis, below:

    Attack: This label signifies the occurrence of an attack originating from an infected device directed towards another host. Any flow that endeavors to exploit a vulnerable service, discerned through payload and behavioral analysis, falls under this classification. Examples include brute force attempts on telnet logins or header-based command injections in GET requests.

    Benign: The "Benign" label denotes connections where no suspicious or malicious activities have been detected.

    C&C (Command and Control): This label indicates that the infected device has established a connection with a Command and Control server. This observation is rooted in the periodic nature of connections or activities such as binary downloads or the exchange of IRC-like or decoded commands.

    DDoS (Distributed Denial of Service): "DDoS" is assigned when the infected device is actively involved in a Distributed Denial of Service attack, identifiable by the volume of flows directed towards a single IP address.

    FileDownload: This label signifies that a file is being downloaded to the infected device. It is determined by examining connections with response bytes exceeding a specified threshold (typically 3KB or 5KB), often in conjunction with known suspicious destination ports or IPs associated with Command and Control servers.

    HeartBeat: "HeartBeat" designates connections where packets serve the purpose of tracking the infected host by the Command and Control server. Such connections are identified through response bytes below a certain threshold (typically 1B) and exhibit periodic similarities. This is often associated with known suspicious destination ports or IPs linked to Command and Control servers.

    Mirai: This label is applied when connections exhibit characteristics resembling those of the Mirai botnet, based on patterns consistent with common Mirai attack profiles.

    Okiru: Similar to "Mirai," the "Okiru" label is assigned to connections displaying characteristics of the Okiru botnet. The parameters for this label are the same as for Mirai, but Okiru is a less prevalent botnet family.

    PartOfAHorizontalPortScan: This label is employed when connections are involved in a horizontal port scan aimed at gathering information for potential subsequent attacks. The labeling decision hinges on patterns such as shared ports, similar transmitted byte counts, and multiple distinct destination IPs among the connections.

    Torii: The "Torii" label is used when connections exhibit traits indicative of the Torii botnet, with labeling criteria similar to those used for Mirai, albeit in the context of a less common botnet family.

    Field NameDescriptionType
    tsThe timestamp of the connection event.time
    uidA unique identifier for the connection.string
    id.orig_hThe source IP address.addr
    id.orig_pThe source port.port
    id.resp_hThe destination IP address.addr
    id.resp_pThe destination port.port
    protoThe network protocol used (e.g., 'tcp').enum
    serviceThe service associated with the connection.string
    durationThe duration of the connection.interval
    orig_bytesThe number of bytes sent from the source to the destination.count
    resp_bytesThe number of bytes sent from the destination to the source.count
    conn_stateThe state of the connection.string
    local_origIndicates whether the connection is considered local or not.bool
    local_respIndicates whether the connection is considered...
  15. S

    benign and injected IoMT packet database

    • scidb.cn
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somaya_haiba (2025). benign and injected IoMT packet database [Dataset]. http://doi.org/10.57760/sciencedb.23587
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Somaya_haiba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was compiled over a year and a half from various websites and sources, and it contains 7449 benign and malicious IoMT packets presented by real-world components of the e-healthcare system that monitor network transmission. Data quality is improved at several preprocessing stages, including dealing with noises and unwanted values as strings, cleaning, encoding string features, and rescaling all disordered data values using data transformation functions. To standardize the analysis of network features, we only consider features related to networking characteristics and reject all other features that provide insights into the patient's vital signs. This data set is for analyzing the IoMT traffic behavior within the smart hospital's networks.

  16. m

    Static and dynamic analysis of both generic and APT-related malware

    • data.mendeley.com
    Updated Mar 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis F. Martin-Liras (2020). Static and dynamic analysis of both generic and APT-related malware [Dataset]. http://doi.org/10.17632/w2w8gjsgnt.1
    Explore at:
    Dataset updated
    Mar 12, 2020
    Authors
    Luis F. Martin-Liras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a total of 1944 features obtained from the static and dynamic analysis reports of more than 19400 malware samples, more than 1000 of then belonging to malware samples obtained from APT attacks. The objective of this dataset is to provide researcher a tool to discern the differences between generic and APT-related malware samples.

  17. n

    HCVDB - Hepatitis C Virus Database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). HCVDB - Hepatitis C Virus Database [Dataset]. http://identifiers.org/RRID:SCR_007703
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. The euHCVdb is a Hepatitis C Virus database oriented towards protein sequence, structure and function analyses and structural biology of HCV. In order to make the existing HCV databases as complementary as possible, the current developments are coordinated with the other databases (Japan and Los Alamos) as part of an international collaborative effort. It is monthly updated from the EMBL Nucleotide sequence database and maintained in a relational database management system. Programs for parsing the EMBL database flat files, annotating HCV entries, filling up and querying the database used SQL and Java programming languages. Great efforts have been made to develop a fully automatic annotation procedure thanks to a reference set of HCV complete annotated well-characterized genomes of various genotypes. This automatic procedure ensures standardization of nomenclature for all entries and provides genomic regions/proteins present in the entry, bibliographic reference, genotype, interesting sites or domains, source of the sequence and structural data that are available as protein 3D models. Hepatitis C, Hepatitis C Virus, Hepatitis C Virus protein .

  18. Global data breaches caused by malware 2023-2024, by industry

    • statista.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global data breaches caused by malware 2023-2024, by industry [Dataset]. https://www.statista.com/statistics/1419328/data-breaches-malware-by-industry/
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 1, 2023 - Oct 31, 2024
    Area covered
    Worldwide
    Description

    Between November 2023 and October 2024, organizations in the manufacturing sector worldwide saw around 1,036 instances of data breaches caused by malware attacks. Professional services ranked second, with 824 data breach cases in the measured period. Furthermore, malware caused 468 data breach incidents in the information sector.

  19. m

    Ransomware Printable Character N-gram Feature Dataset

    • data.mendeley.com
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keven Gonçalves (2025). Ransomware Printable Character N-gram Feature Dataset [Dataset]. http://doi.org/10.17632/ghpy6kdhx5.1
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Keven Gonçalves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was generated for the academic research paper titled "Zero-Day Ransomware Family Detection Based on Printable Character Analysis and Machine Learning", published in the Electronic Journal of Scientific Initiation in Computing (Revista Eletrônica de Iniciação Científica em Computação – REIC), vol. 23 (2025), doi: http://doi.org/10.5753/reic.2025.6021.

    It contains structural features in the form of 3-, 4-, and 5-gram printable characters extracted from 2,675 binary executable samples. The training and validation set consists of 2,157 samples (80%): 1,023 ransomware samples from 25 relevant families and 1,134 goodware samples. The testing set consists of 518 samples (20%): 385 ransomware samples from 15 recent families and 133 goodware samples.

    The CSV file columns are sample ID, filename, target class (RG), family ID, and numerical columns ( binaryfeatures), as follows: | ID | filename | RG | family | 2000 Features | Training Goodware | 10000 to 11133 | Their name.exe | 0 | 0 | Binary features | Testing Goodware | 12000 to 12132 | Their name.exe | 0 | 0 | Binary features | Training Ransomware | 20000 to 21022 | Their SHA-256 hash | 1 | 1-25 family IDs | Binary features | Testing Ransomware | 22000 to 22384 | Their SHA-256 hash | 1 | 26-40 family IDs | Binary features |

    Family IDs: Avaddon 1 Babuk 2 Blackmatter 3 Conti 4 Darkside 5 Dharma 6 Doppelpaymer 7 Exorcist 8 Gandcrab 9 Lockbit 10 Makop 11 Maze 12 Mountlocker 13 Nefilim 14 Netwalker 15 Phobos 16 Pysa 17 Ragnarok 18 RansomeXX 19 Revil 20 Ryuk 21 Stop 22 Thanos 23 Wastedlocker 24 Zeppelin 25

    AvosLocker 26 BianLian 27 BlackBasta 28 BlackByte 29 BlackCat 30 BlueSky 31 Clop 32 Hive 33 HolyGhost 34 Karma 35 Lorenz 36 Maui 37 Night Sky 38 PlayCrypt 39 Quantum 40

  20. m

    Android Malware and Normal permissions dataset

    • data.mendeley.com
    • impactcybertrust.org
    Updated Mar 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2018). Android Malware and Normal permissions dataset [Dataset]. http://doi.org/10.17632/958wvr38gy.1
    Explore at:
    Dataset updated
    Mar 13, 2018
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
José Reyes (2025). MalwareBazaar Malware Dataset (Sep - Oct 2025) [Dataset]. https://www.kaggle.com/datasets/arkreyes/malwarebazaar-malware-dataset-sep-oct-2025
Organization logo

MalwareBazaar Malware Dataset (Sep - Oct 2025)

a dataset of uploaded malware in MalwareBazaar's database.

Explore at:
zip(9415213 bytes)Available download formats
Dataset updated
Oct 9, 2025
Authors
José Reyes
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

MalwareBazaar Malware Dataset.

Introduction.

This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.

Description.

The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.

  • Metadata removed (footer with unreadable information).
  • 'date' formatted to datetime (better reading format).
  • Data filtered from the last 90 days.
  • Unnecessary columns with "NaN" data removed.
Search
Clear search
Close search
Google apps
Main menu