73 datasets found

Android malware dataset for machine learning 2
figshare.com
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suleiman Yerima (2023). Android malware dataset for machine learning 2 [Dataset]. http://doi.org/10.6084/m9.figshare.5854653.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5854653.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Suleiman Yerima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.
i
Malware Dataset IDN
ieee-dataport.org
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Firdaus Nugroho (2024). Malware Dataset IDN [Dataset]. https://ieee-dataport.org/documents/malware-dataset-idn
Explore at:
Dataset updated
Jan 10, 2024
Authors
Firdaus Nugroho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This malware dataset collected from Indonesia. The Malicious Windows Portable Executable has been extracted using LIEF library. The main objective of this dataset is to support research in the field of malware detection by employing machine learning methodologies. The gathered data will aid in the creation of more effective and precise machine-learning algorithms for detecting and reducing malware risks in Windows-operated systems.
i
Malware Analysis Datasets: Top-1000 PE Imports
ieee-dataport.org
Updated Nov 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: Top-1000 PE Imports [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-top-1000-pe-imports
Explore at:
Dataset updated
Nov 8, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
m
Ransomware and user samples for training and validating ML models
data.mendeley.com
Updated Sep 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Berrueta (2021). Ransomware and user samples for training and validating ML models [Dataset]. http://doi.org/10.17632/yhg5wk39kf.2
Explore at:
Unique identifier
https://doi.org/10.17632/yhg5wk39kf.2
Dataset updated
Sep 17, 2021
Authors
Eduardo Berrueta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ransomware is considered as a significant threat for most enterprises since past few years. In scenarios wherein users can access all files on a shared server, one infected host is capable of locking the access to all shared files. In the article related to this repository, we detect ransomware infection based on file-sharing traffic analysis, even in the case of encrypted traffic. We compare three machine learning models and choose the best for validation. We train and test the detection model using more than 70 ransomware binaries from 26 different families and more than 2500 h of ‘not infected’ traffic from real users. The results reveal that the proposed tool can detect all ransomware binaries, including those not used in the training phase (zero-days). This paper provides a validation of the algorithm by studying the false positive rate and the amount of information from user files that the ransomware could encrypt before being detected.

This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.

The folders are named NxSy where x is the number of 1-second interval per sample and y the sliding step in seconds.

Each folder (for example N10S10/) contains: - tree.py -> Python script with the Tree model. - ensemble.json -> JSON file with the information about the Ensemble model. - NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3). - N10S10.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application. - zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application. - userSamples_test -> All samples used for validating each model in this folder. It is in csv format for using in bigML application. - userSamples_train -> User samples used for training the models. - ransomware_train -> Ransomware samples used for training the models - scaler.scaler -> Standard Scaler from python library used for scale the samples. - zeroDays_notFiltered -> Folder with the zeroDay samples.

In the case of N30S30 folder, there is an additional folder (SMBv2SMBv3NFS) with the samples extracted from the SMBv2, SMBv3 and NFS traffic traces. There are more binaries than the ones presented in the article, but it is because some of them are not "unseen" binaries (the families are present in the training set).

The files containing samples (NxSy.csv, zeroDays.csv and userSamples_test.csv) are structured as follows: - Each line is one sample. - Each sample has 3*T features and the label (1 if it is 'infected' sample and 0 if it is not). - The features are separated by ',' because it is a csv file. - The last column is the label of the sample.

Additionally we have placed two pcap files in root directory. There are the traces used for compare both versions of SMB.
f
RDE-Dataset.zipRansomware Defense Empowered: Deep Learning for Real-Time...
figshare.com
zip
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassan jalil hadi; Hassan Jalil Hadi (2024). RDE-Dataset.zipRansomware Defense Empowered: Deep Learning for Real-Time Family Identification with a Proprietary Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25467826.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25467826.v1
Dataset updated
Mar 24, 2024
Dataset provided by
figshare
Authors
Hassan jalil hadi; Hassan Jalil Hadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ransomware, leveraging sophisticated encryption techniques, poses a significant threat by encrypting crucial data, thereby rendering it inaccessible. The proliferation of diverse ransomware variants has caused considerable harm to governments, corporations, and individual users alike. Despite the increasing prevalence of cyber threats, existing solutions often struggle with real-time detection and early identification of ransomware families. To address this challenge, we introduce FCG-RFD, a novel benchmark dataset featuring extensive Function Call Graphs (FCG) tailored for ransomware family detection. Given the constantly evolving nature of malware, antivirus scanners face ongoing challenges, necessitating access to recent and updated datasets. Our dataset comprises 8,095 samples sourced from reputable repositories including VirusSamples, Virusshare, VirusSign, the Zoo, and MalwareBazaar. Additionally, we include 8,020 normal files obtained from trusted sources such as the Microsoft Store and Softonic. Through FCG-RFD, we aim to facilitate more robust and timely detection of ransomware families, ultimately enhancing cybersecurity measures against this pervasive threat.
MH-100K-Dataset
figshare.com
zip
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova (2023). MH-100K-Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24328885.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24328885.v2
Dataset updated
Oct 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset MH-100K, an extensive collection of Android malware information comprising 101,975 samples. It encompasses a main CSV file with valuable metadata, including the SHA256 hash (APK’s signature), file name, package name, Android’s official compilation API, 166 permissions, 24,417 API calls, and 250 intents.
i
Malware Analysis Datasets: Chimera Multimodal Deep Learning Android Malware...
ieee-dataport.org
Updated Oct 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2021). Malware Analysis Datasets: Chimera Multimodal Deep Learning Android Malware Detection Method [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-chimera-multimodal-deep-learning-android-malware-detection
Explore at:
Dataset updated
Oct 3, 2021
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
and dynamic analysis data (system call sequences).
Network Traffic Android Malware
kaggle.com
zip
Updated Sep 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Urcuqui (2019). Network Traffic Android Malware [Dataset]. https://www.kaggle.com/datasets/xwolf12/network-traffic-android-malware
Explore at:
zip(116603 bytes)Available download formats
Dataset updated
Sep 12, 2019
Authors
Christian Urcuqui
Description
Introduction

Android is one of the most used mobile operating systems worldwide. Due to its technological impact, its open-source code and the possibility of installing applications from third parties without any central control, Android has recently become a malware target. Even if it includes security mechanisms, the last news about malicious activities and Android´s vulnerabilities point to the importance of continuing the development of methods and frameworks to improve its security.

To prevent malware attacks, researches and developers have proposed different security solutions, applying static analysis, dynamic analysis, and artificial intelligence. Indeed, data science has become a promising area in cybersecurity, since analytical models based on data allow for the discovery of insights that can help to predict malicious activities.

In this work, we propose to consider some network layer features as the basis for machine learning models that can successfully detect malware applications, using open datasets from the research community.

Content

This dataset is based on another dataset (DroidCollector) where you can get all the network traffic in pcap files, in our research we preprocessed the files in order to get network features that are illustrated in the next article:

López, C. C. U., Villarreal, J. S. D., Belalcazar, A. F. P., Cadavid, A. N., & Cely, J. G. D. (2018, May). Features to Detect Android Malware. In 2018 IEEE Colombian Conference on Communications and Computing (COLCOM) (pp. 1-6). IEEE.

Acknowledgements

Cao, D., Wang, S., Li, Q., Cheny, Z., Yan, Q., Peng, L., & Yang, B. (2016, August). DroidCollector: A High Performance Framework for High Quality Android Traffic Collection. In Trustcom/BigDataSE/I SPA, 2016 IEEE (pp. 1753-1758). IEEE
Drone-Based Malware Detection (DBMD)
kaggle.com
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). Drone-Based Malware Detection (DBMD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/9045375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9045375
Dataset updated
Jul 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DatasetEngineer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description Welcome to the Drone-Based Malware Detection dataset! This dataset is designed to aid researchers and practitioners in exploring innovative cybersecurity solutions using drone-collected data. The dataset contains detailed information on network traffic, drone sensor readings, malware detection indicators, and environmental conditions. It offers a unique perspective by integrating data from drones with traditional network security metrics to enhance malware detection capabilities.

Dataset Overview The dataset comprises four main categories:

Network Traffic Data: Captures network traffic attributes including IP addresses, ports, protocols, packet sizes, and various derived metrics. Drone Sensor Data: Includes GPS coordinates, altitude, speed, heading, battery level, and other sensor readings from drones. Malware Detection Data: Contains indicators and scores relevant to detecting malware, such as anomaly scores, suspicious IP counts, reputation scores, and attack types. Environmental Data: Provides context through environmental conditions like location type, noise level, weather conditions, and more. Files and Features The dataset is divided into four separate CSV files:

network_traffic_data.csv

timestamp: Date and time of the traffic event. source_ip: Source IP address. destination_ip: Destination IP address. source_port: Source port number. destination_port: Destination port number. protocol: Network protocol (TCP, UDP, ICMP). packet_length: Length of the network packet. payload_data: Content of the packet payload. flag: Network flag (SYN, ACK, FIN, RST). traffic_volume: Volume of traffic in bytes. flow_duration: Duration of the network flow. flow_bytes_per_s: Bytes per second for the flow. flow_packets_per_s: Packets per second for the flow. packet_count: Number of packets in the flow. average_packet_size: Average size of packets. min_packet_size: Minimum packet size. max_packet_size: Maximum packet size. packet_size_variance: Variance in packet sizes. header_length: Length of the packet header. payload_length: Length of the packet payload. ip_ttl: Time to live for the IP packet. tcp_window_size: TCP window size. icmp_type: ICMP type (echo_request, echo_reply, destination_unreachable). dns_query_count: Number of DNS queries. dns_response_count: Number of DNS responses. http_method: HTTP method (GET, POST, PUT, DELETE). http_status_code: HTTP status code (200, 404, 500, 301). content_type: Content type (text/html, application/json, image/png). ssl_tls_version: SSL/TLS version. ssl_tls_cipher_suite: SSL/TLS cipher suite. drone_data.csv

latitude: Latitude of the drone. longitude: Longitude of the drone. altitude: Altitude of the drone. speed: Speed of the drone. heading: Heading of the drone. battery_level: Battery level of the drone. drone_id: Unique identifier for the drone. flight_time: Total flight time. signal_strength: Strength of the drone's signal. temperature: Temperature at the drone's location. humidity: Humidity at the drone's location. pressure: Atmospheric pressure at the drone's location. wind_speed: Wind speed at the drone's location. wind_direction: Wind direction at the drone's location. gps_accuracy: Accuracy of the GPS signal. malware_detection_data.csv

anomaly_score: Score indicating the level of anomaly detected. suspicious_ip_count: Number of suspicious IP addresses detected. malicious_payload_indicator: Indicator for malicious payload (0 or 1). reputation_score: Reputation score for the network entity. behavioral_score: Behavioral score indicating potential malicious activity. attack_type: Type of attack (DDoS, phishing, malware). signature_match: Indicator for signature match (0 or 1). sandbox_result: Result from sandbox analysis (clean, infected). heuristic_score: Heuristic score for potential threats. traffic_pattern: Pattern of the traffic (burst, steady). environmental_data.csv

location_type: Type of location (urban, rural). nearby_devices: Number of nearby devices. signal_interference: Level of signal interference. noise_level: Noise level in the environment. time_of_day: Time of day (morning, afternoon, evening, night). day_of_week: Day of the week. weather_conditions: Weather conditions (sunny, rainy, cloudy, stormy). Usage and Applications This dataset can be used for:

Cybersecurity Research: Developing and testing algorithms for malware detection using drone data. Machine Learning: Training models to identify malicious activity based on network traffic and drone sensor readings. Data Analysis: Exploring the relationships between environmental conditions, drone sensor data, and network traffic anomalies. Educational Purposes: Teaching data science, machine learning, and cybersecurity concepts using a comprehensive and multi-faceted dataset.

Acknowledgements This dataset is based on real-world data collected from drone sensors and network traffic monitoring s...
i
Malware Analysis Datasets: Raw PE as Image
ieee-dataport.org
Updated Nov 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: Raw PE as Image [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-raw-pe-image
Explore at:
Dataset updated
Nov 7, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
f
MC-dataset-multiclass
figshare.com
zip
Updated Mar 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo de O. Andrade (2018). MC-dataset-multiclass [Dataset]. http://doi.org/10.6084/m9.figshare.5995468.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5995468.v1
Dataset updated
Mar 17, 2018
Dataset provided by
figshare
Authors
Eduardo de O. Andrade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multiclass and labeled dataset containing 3290 samples of cleanwares and 16450 of malwares. Five classes of malware (backdoor, rootkit, trojan, virus and worm) with 3290 samples of each class were labeled using antivirus scans from VirusTotal. It was considered for the classification of each sample of malware, the most present class returned by the antivirus scans. Files with a maximum size of approximately 3.2 MB. All files were named with their md5 hashes, hiding their original names. The cleanwares were extracted from several different versions of Windows systems and the malwares obtained from the VirusShare repository.
Android Malware Dataset for Machine Learning
kaggle.com
Updated Mar 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shashwat Tiwari (2021). Android Malware Dataset for Machine Learning [Dataset]. https://www.kaggle.com/datasets/shashwatwork/android-malware-dataset-for-machine-learning/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shashwat Tiwari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

"Mobile malware is malicious software that targets mobile phones or wireless-enabled Personal digital assistants (PDA), by causing the collapse of the system and loss or leakage of confidential information. As wireless phones and PDA networks have become more and more common and have grown in complexity, it has become increasingly difficult to ensure their safety and security against electronic attacks in the form of viruses or other malware."

Content

Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection. The supporting file contains the description of the feature vectors/attributes obtained via static code analysis of the Android apps.

Acknowledgements

Yerima, Suleiman (2018): Android malware dataset for machine learning 2. figshare. Dataset. https://doi.org/10.6084/m9.figshare.5854653.v1 Data Source - https://figshare.com/articles/dataset/Android_malware_dataset_for_machine_learning_2/5854653 Literature URL - https://ieeexplore.ieee.org/document/8245867
f
MC-dataset-binary
figshare.com
zip
Updated Mar 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo de O. Andrade (2018). MC-dataset-binary [Dataset]. http://doi.org/10.6084/m9.figshare.5995408.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5995408.v1
Dataset updated
Mar 16, 2018
Dataset provided by
figshare
Authors
Eduardo de O. Andrade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Binary and labeled dataset containing 11480 samples of cleanwares and malwares with 5740 samples of each class. Files with a maximum size of approximately 3.2 MB. All files were named with their md5 hashes, hiding their original names. The cleanwares were extracted from several different versions of Windows systems and the malwares obtained from the VirusShare repository.
m
Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning...
data.mendeley.com
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Wang (2022). Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning based Encrypted Traffic Analysis [Dataset]. http://doi.org/10.17632/xw7r4tt54g.1
Explore at:
Unique identifier
https://doi.org/10.17632/xw7r4tt54g.1
Dataset updated
Dec 6, 2022
Authors
Zihao Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets.

Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic.

Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset.

The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.
i
Malware API Call Dataset
ieee-dataport.org
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferhat Ozgur Catak (2022). Malware API Call Dataset [Dataset]. https://ieee-dataport.org/open-access/malware-api-call-dataset
Explore at:
Dataset updated
May 18, 2022
Authors
Ferhat Ozgur Catak
Description
This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.
Sophos/ReversingLabs 20 Million malware detection dataset
registry.opendata.aws
Updated Dec 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophos AI (2020). Sophos/ReversingLabs 20 Million malware detection dataset [Dataset]. https://registry.opendata.aws/sorel-20m/
Explore at:
Dataset updated
Dec 18, 2020
Dataset provided by
Sophoshttp://sophos.com/
Description
A dataset intended to support research on machine learning techniques for detecting malware. It includes metadata and EMBER-v2 features for approximately 10 million benign and 10 million malicious Portable Executable files, with disarmed but otherwise complete files for all malware samples. All samples are labeled using Sophos in-house labeling methods, have features extracted using the EMBER-v2 feature set, well as metadata obtained via the pefile python library, detection counts obtained via ReversingLabs telemetry, and additional behavioral tags that indicate the rough behavior of the samples.
P
EMBER Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyrum S. Anderson; Phil Roth (2021). EMBER Dataset [Dataset]. https://paperswithcode.com/dataset/ember
Explore at:
Dataset updated
Feb 2, 2021
Authors
Hyrum S. Anderson; Phil Roth
Description
A labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign).
f
Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset
figshare.com
application/x-rar
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van (2023). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6635642.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6635642.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset contains 8970 malware and 1000 benign binaries files. Malware files which are divided into 5 types: Locker (300), Mediyes (1450), Winwebsec (4400), Zbot (2100), Zeroaccess (690). All of malware files are collected from https://virusshare.com/ and malicia-project.com. Benign excutable files are taken from installed folders of applications of legitimate software from different categories. They can be downloaded in https://download.cnet.com/windows/. All of files are verified by VirusTotal (https://www.virustotal.com) to make sure each file belong to their type. Note: This dataset includes malware so it can harm your computer.
IoT-23: A labeled dataset with malicious and benign IoT network traffic
zenodo.org
explore.openaire.eu
+1more
application/gzip
Updated Sep 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Garcia; Sebastian Garcia; Agustin Parmisano; Maria Jose Erquiaga; Agustin Parmisano; Maria Jose Erquiaga (2021). IoT-23: A labeled dataset with malicious and benign IoT network traffic [Dataset]. http://doi.org/10.5281/zenodo.4743746
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4743746
Dataset updated
Sep 3, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Garcia; Sebastian Garcia; Agustin Parmisano; Maria Jose Erquiaga; Agustin Parmisano; Maria Jose Erquiaga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IoT-23 is a dataset of network traffic from Internet of Things (IoT) devices. It has 20 malware captures executed in IoT devices, and 3 captures for benign IoT devices traffic. It was first published in January 2020, with captures ranging from 2018 to 2019. These IoT network traffic was captured in the Stratosphere Laboratory, AIC group, FEL, CTU University, Czech Republic. Its goal is to offer a large dataset of real and labeled IoT malware infections and IoT benign traffic for researchers to develop machine learning algorithms. This dataset and its research was funded by Avast Software. The malware was allow to connect to the Internet.
CTU-SME-11: a labeled dataset with real benign and malicious network traffic...
zenodo.org
data.niaid.nih.gov
bin, bz2, csv, html
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia (2023). CTU-SME-11: a labeled dataset with real benign and malicious network traffic mimicking a small medium-size enterprise environment [Dataset]. http://doi.org/10.5281/zenodo.7958259
Explore at:
csv, html, bz2, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7958259
Dataset updated
May 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Štěpán Bendl; Štěpán Bendl; Veronica Valeros; Veronica Valeros; Sebastian Garcia; Sebastian Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As technology advances, the number and complexity of cyber-attacks increase, forcing defense techniques to be updated and improved. To help develop effective tools for detecting security threats it is essential to have reliable and representative security datasets. Many existing security datasets have limitations that make them unsuitable for research, including lack of labels, unbalanced traffic, and outdated threats.

CTU-SME-11 is a labeled network dataset designed to address the limitations of previous datasets. The dataset was captured in a real network that mimics a small-medium enterprise setting. Raw network traffic (packets) was captured from 11 devices using tcpdump for a duration of 7 days, from 20th to 26th of February, 2023 in Prague, Czech Republic. The devices were chosen based on the enterprise setting and consists of IoT, desktop and mobile devices, both bare metal and virtualized. The devices were infected with malware or exposed to Internet attacks, and factory reset to restore benign behavior.

The raw data was processed to generate network flows (Zeek logs) which were analyzed and labeled. The dataset contains two types of levels, a high level label and a descriptive label, which were put by experts. The former can take three values, benign, malicious or background. The latter contains detailed information about the specific behavior observed in the network flows. The dataset contains 99 million labeled network flows. The overall compressed size of the dataset is 80GB and the uncompressed size is 170GB.

Facebook

Twitter

Click to copy link

Link copied

Cite

Suleiman Yerima (2023). Android malware dataset for machine learning 2 [Dataset]. http://doi.org/10.6084/m9.figshare.5854653.v1

Android malware dataset for machine learning 2

Explore at:

16 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5854653.v1

Dataset updated

May 30, 2023

Dataset provided by

figshare

Authors

Suleiman Yerima

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.

Clear search

Close search

Google apps

Main menu

Android malware dataset for machine learning 2

Malware Dataset IDN

Malware Analysis Datasets: Top-1000 PE Imports

Ransomware and user samples for training and validating ML models

RDE-Dataset.zipRansomware Defense Empowered: Deep Learning for Real-Time...

MH-100K-Dataset

Malware Analysis Datasets: Chimera Multimodal Deep Learning Android Malware...

Network Traffic Android Malware

Introduction

Content

Acknowledgements

Drone-Based Malware Detection (DBMD)

Malware Analysis Datasets: Raw PE as Image

MC-dataset-multiclass

Android Malware Dataset for Machine Learning

Context

Content

Acknowledgements

MC-dataset-binary

Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning...

Malware API Call Dataset

Sophos/ReversingLabs 20 Million malware detection dataset

EMBER Dataset

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset

IoT-23: A labeled dataset with malicious and benign IoT network traffic

CTU-SME-11: a labeled dataset with real benign and malicious network traffic...

Android malware dataset for machine learning 2See More Versions

Android malware dataset for machine learning 2