Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
State-of-the-art comparison with the existing techniques.
Mobile malware detection has attracted massive research effort in our community. A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Essentially, the malware ground truth should be manually verified by security experts, and their malicious behaviors should be carefully labelled. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc.
We make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. For more details, please visit https://malradar.github.io/
The dataset includes the following files:
(1) sample-info.csv
In this file, we list all the detailed information about each sample, including apk file hash, app name, package name, report family, etc.
(2) malradar.zip
We have packaged the malware samples in chunks of 1000 applications: malradar-0, malradar-1, malradar-2, malradar-3. All the apk files name after the file SHA256.
If your papers or articles used our dataset, please include a citation to our paper:
@article{wang2022malradar,
title={MalRadar: Demystifying Android Malware in the New Era},
author={Wang, Liu and Wang, Haoyu and He, Ren and Tao, Ran and Meng, Guozhu and Luo, Xiapu and Liu, Xuanzhe},
journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems},
volume={6},
number={2},
pages={1--27},
year={2022},
publisher={ACM New York, NY, USA}
}
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.
Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.
Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.
We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.
In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:
benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application
benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file
malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application
malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rapid growth of Android applications has led to an increase in security threats, while traditional detection methods struggle to combat advanced malware, such as polymorphic and metamorphic variants. To address these challenges, this study introduces a hybrid deep learning model (DBN-GRU) that integrates Deep Belief Networks (DBN) for static analysis and Gated Recurrent Units (GRU) for dynamic behavior modeling to enhance malware detection accuracy and efficiency. The model extracts static features (permissions, API calls, intent filters) and dynamic features (system calls, network activity, inter-process communication) from Android APKs, enabling a comprehensive analysis of application behavior.The proposed model was trained and tested on the Drebin dataset, which includes 129,013 applications (5,560 malware and 123,453 benign).Performance evaluation against NMLA-AMDCEF, MalVulDroid, and LinRegDroid demonstrated that DBN-GRU achieved 98.7% accuracy, 98.5% precision, 98.9% recall, and an AUC of 0.99, outperforming conventional models.In addition, it exhibits faster preprocessing, feature extraction, and malware classification times, making it suitable for real-time deployment.By bridging static and dynamic detection methodologies, the DBN-GRU enhances malware detection capabilities while reducing false positives and computational overhead.These findings confirm the applicability of the proposed model in real-world Android security applications, offering a scalable and high-performance malware detection solution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Android is the most popular operating system of the latest mobile smart devices. With this operating system, many Android applications have been developed and become an essential part of our daily lives. Unfortunately, different kinds of Android malware have also been generated with these applications’ endless stream and somehow installed during the API calls, permission granted and extra packages installation and badly affected the system security rules to harm the system. Therefore, it is compulsory to detect and classify the android malware to save the user’s privacy to avoid maximum damages. Many research has already been developed on the different techniques related to android malware detection and classification. In this work, we present AMDDLmodel a deep learning technique that consists of a convolutional neural network. This model works based on different parameters, filter sizes, number of epochs, learning rates, and layers to detect and classify the android malware. The Drebin dataset consisting of 215 features was used for this model evaluation. The model shows an accuracy value of 99.92%. The other statistical values are precision, recall, and F1-score. AMDDLmodel introduces innovative deep learning for Android malware detection, enhancing accuracy and practical user security through inventive feature engineering and comprehensive performance evaluation. The AMDDLmodel shows the highest accuracy values as compared to the existing techniques.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Mobile malware is malicious software that targets mobile phones or wireless-enabled Personal digital assistants (PDA), by causing the collapse of the system and loss or leakage of confidential information. As wireless phones and PDA networks have become more and more common and have grown in complexity, it has become increasingly difficult to ensure their safety and security against electronic attacks in the form of viruses or other malware."
Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection. The supporting file contains the description of the feature vectors/attributes obtained via static code analysis of the Android apps.
Yerima, Suleiman (2018): Android malware dataset for machine learning 2. figshare. Dataset. https://doi.org/10.6084/m9.figshare.5854653.v1 Data Source - https://figshare.com/articles/dataset/Android_malware_dataset_for_machine_learning_2/5854653 Literature URL - https://ieeexplore.ieee.org/document/8245867
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of Jacob Drebin contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the two datasets -- EMBER Class and AZ Class to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.
Descubra a distribuição mundial do sobrenome Drebin. Presente em 15 países com 284 pessoas registradas.
Opdag den verdensomspændende fordeling af efternavnet Drebin. Til stede i 15 lande med 284 registrerede personer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Security performance of models on DREBIN dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall Accuracy and Precision for Drebin dataset (rounded).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparative analysis of android malware detection techniques.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of proposed and baseline approaches based on metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FEAMDA: Fusion-based Explainable Android Malware Detection Agent with LLM Support is a unified malware detection framework designed to enhance detection accuracy and interpretability by leveraging multi-modal static features and large language model (LLM)-driven reasoning.Unlike traditional detection systems that treat static code representations independently or rely on opaque deep learning models, FEAMDA introduces a novel cross-modal fusion strategy. It combines:Low-level grayscale images derived from DEX bytecode, which capture structural patterns (e.g., entropy, packing, code density);High-level behavioral features such as API call sequences and permissions, which encode the app's semantic intent.To bridge the semantic gap between these heterogeneous features, FEAMDA employs a feature textualization approach, transforming both modalities into structured natural language prompts. These prompts are processed by an LLM (e.g., DeepSeek or GPT-4o), which performs both classification and explainable reasoning.Empirical results on benchmark datasets (Drebin, AMD) demonstrate that FEAMDA achieves:State-of-the-art detection accuracy (up to 95.4%);High interpretability through natural language explanations (AOR > 4.3);Strong robustness under various obfuscation techniques including symbol renaming, DEX packing, and code encryption.FEAMDA represents a shift from traditional black-box malware detection toward LLM-augmented, semantically transparent analysis agents, offering practical implications for next-generation mobile threat defense.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.