19 datasets found

Z
The MalRadar Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MalRadar (2022). The MalRadar Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6451768
Explore at:
Dataset updated
Jul 5, 2022
Dataset authored and provided by
MalRadar
Description
Mobile malware detection has attracted massive research effort in our community. A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Essentially, the malware ground truth should be manually verified by security experts, and their malicious behaviors should be carefully labelled. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc.

We make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. For more details, please visit https://malradar.github.io/

The dataset includes the following files:

(1) sample-info.csv

In this file, we list all the detailed information about each sample, including apk file hash, app name, package name, report family, etc.

(2) malradar.zip

We have packaged the malware samples in chunks of 1000 applications: malradar-0, malradar-1, malradar-2, malradar-3. All the apk files name after the file SHA256.

If your papers or articles used our dataset, please include a citation to our paper:

@article{wang2022malradar, title={MalRadar: Demystifying Android Malware in the New Era}, author={Wang, Liu and Wang, Haoyu and He, Ren and Tao, Ran and Meng, Guozhu and Luo, Xiapu and Liu, Xuanzhe}, journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems}, volume={6}, number={2}, pages={1--27}, year={2022}, publisher={ACM New York, NY, USA} }
Development of Android malware worldwide 2016-2020
statista.com
Updated Jul 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Development of Android malware worldwide 2016-2020 [Dataset]. https://www.statista.com/statistics/680705/global-android-malware-volume/
Explore at:
Dataset updated
Jul 7, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2013 - Mar 2020
Area covered
Worldwide
Description
As of March 2020, the total number of new Android malware samples amounted to 482,579 per month. According to AV-Test, trojans were the most common type of malware affecting Android devices.
MH-100K-Dataset
figshare.com
zip
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova (2023). MH-100K-Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24328885.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24328885.v2
Dataset updated
Oct 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset MH-100K, an extensive collection of Android malware information comprising 101,975 samples. It encompasses a main CSV file with valuable metadata, including the SHA256 hash (APK’s signature), file name, package name, Android’s official compilation API, 166 permissions, 24,417 API calls, and 250 intents.
Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android...
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic (2020). Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" [Dataset]. http://doi.org/10.5281/zenodo.1420449
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1420449
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic
Description
Protection against ransomware is particularly relevant in systems running the Android operating system, due to its huge users' base and, therefore, its potential for monetization from the attackers. In "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" (see references for details), we describe a hybrid (static + dynamic) malware detection method that has extremely good accuracy (100% detection rate, with false positive below 4%).

We release a dataset related to the dynamic detection part of the aforementioned methods and containing execution traces of ransomware Android applications, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 666 ransomware applications taken from the Heldroid project [https://github.com/necst/heldroid] (the app repository is unavailable at the moment). Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 20,000 stimuli were applied with a maximum execution time of 15 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

ransom-per_app-csv.zip - features obtained by executing ransomware applications, one CSV per application

ransom-unified-csv.zip - features obtained by executing ransomware applications, only one CSV file
Distribution of Android malware 2019
statista.com
Updated Jul 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Distribution of Android malware 2019 [Dataset]. https://www.statista.com/statistics/681006/share-of-android-types-of-malware/
Explore at:
Dataset updated
Jul 7, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2019
Area covered
Worldwide
Description
According to AV-Test, trojans were found to be the preferred means for cybercriminals to infiltrate Android systems. In 2019, trojans accounted for 93.93 percent of all malware attacks on Android systems. Ransomware ranked second, with 2.47 percent of Android malware samples involving this variant.
Android Malware Dataset
kaggle.com
zip
Updated Jun 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2021). Android Malware Dataset [Dataset]. https://www.kaggle.com/saurabhshahane/android-malware-dataset
Explore at:
zip(4152702 bytes)Available download formats
Dataset updated
Jun 4, 2021
Authors
Saurabh Shahane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

A dataset of metainformation of benign and malware Android samples used in the paper Martín, A., Calleja, A., Menéndez, H. D., Tapiador, J., & Camacho, D. (2016, December). ADROIT: Android malware detection using meta-information. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on (pp. 1-8). IEEE.

Acknowledgements

Martín, Alejandro; Calleja, Alejandro; Menéndez, Héctor D.; Tapiador, Juan; Camacho, David (2017), “ADROIT”, Mendeley Data, V2, doi: 10.17632/yr92xbrvgx.2
O
Coronavirus-themed Mobile Malware
opendatalab.com
paperswithcode.com
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research Asia, Coronavirus-themed Mobile Malware [Dataset]. https://opendatalab.com/OpenDataLab/Coronavirus-themed_Mobile_etc
Explore at:
zipAvailable download formats
Dataset provided by
Zhejiang University
Microsoft Research Asia
Beijing University of Posts and Telecommunications
Description
This is a dataset for coronavirus-themed malware for Android devices. It is a daily growing COVID-19 themed mobile app dataset, which contains 4,322 COVID-19 themed apk samples (2,500 unique apps) and 611 potential malware samples (370 unique malicious apps) by the time of mid-November, 2020.
m
Android Benign and Malware Dataset
data.mendeley.com
Updated Mar 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2024). Android Benign and Malware Dataset [Dataset]. http://doi.org/10.17632/rvjptkrc34.1
Explore at:
Unique identifier
https://doi.org/10.17632/rvjptkrc34.1
Dataset updated
Mar 6, 2024
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains permission data set extracted from different .apk files downloaded from thirty repositories. It contain more than 1500 permissions. Apps belong to thirty different categories.
Z
DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mercaldo, Francesco (2020). DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1296277
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Malek, Miroslaw
Milosevic, Jelena
Ferrante, Alberto
Mercaldo, Francesco
Description
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.

Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.

Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.

We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application

benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file

malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application

malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
Android Malware Family Labeling
zenodo.org
zip
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang; Wang (2024). Android Malware Family Labeling [Dataset]. http://doi.org/10.5281/zenodo.13790832
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13790832
Dataset updated
Sep 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wang; Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository includes VirusTotal scan reports of samples of two datasets used in the paper.

(1) GPset-VT-Reports.zip: the first and second VT scans for the GPset samples.

(2) 3rdset-VT-Reports.zip: the first and second VT scans for the 3rdset samples.

(3)FusedApp-VT-Reports.zip: the VT scan for the fused (multi-family) samples.

Each scan report is a JSON file generated by VirusTotal, including file metadata (e.g., hashes, size), certificate metadata (e.g., thumbprint), VirusTotal-specific data (e.g., submission time), and a list of detection labels assigned by various antivirus engines used to scan the file.

[CIC-AndMal-2020] Static-Dynamic Malware analysis

kaggle.com

Updated Dec 27, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Alberto Zorzetto (2021). [CIC-AndMal-2020] Static-Dynamic Malware analysis [Dataset]. https://www.kaggle.com/datasets/albertozorzetto/cic-andmal-2020-dynamic-static-analysis

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 27, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Alberto Zorzetto

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Introduction

This dataset contains 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms,**trojan-spy** and zero-day.

A complete taxonomy of all the malware families of captured malware apps is created by dividing them into 8 categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings.

Dataset details

Category	Number of families	Number of samples
Adware	48	47,210
Backdoor	11	1,538
File Infector	5	669
No Category	-	2,296
PUA	8	2,051
Ransomware	8	6,202
Riskware	21	97,349
Scareware	3	1,556
Trojan	45	13,559
Trojan-Banker	11	887
Trojan-Dropper	9	2,302
Trojan-SMS	11	3,125
Trojan-Spy	11	3,540
Zero-day	-	13,340

Static analysis

AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:

Activities: An android activity is one screen of the android app's user interface
Broadcast receivers and providers
Metadata: It is basically an additional option to store information that can be accessed through the entire project
The permissions requested by application: It protects the privacy of the user and is needed to access sensitive user data (such as contacts and SMS)
System features (such as camera and internet)

Static Features

Feature	Values
Package Name	"com.fb.iwidget"
Activities	"com.fb.iwidget.OverlayActivity" "org.acra.CrashReportDialog" "com.batch.android.BatchActionActivity" "com.fb.iwidget.MainActivity" "com.fb.iwidget.PreferencesActivity" "com.fb.iwidget.PickerActivity" "com.fb.iwidget.IntroActivity"
Services	"com.batch.android.BatchActionService" "com.fb.iwidget.MainService" "com.fb.iwidget.SnapAccessService"
Receivers/Providers	"com.fb.iwidget.ExpandWidgetProvider" "com.fb.iwidget.ActionReceiver"
Intents Actions	"android.accessibilityservice.AccessibilityService" "android.appwidget.action.APPWIDGET_UPDATE" "android.intent.action.BOOT_COMPLETED" "android.intent.action.CREATE_SHORTCUT" "android.intent.action.MAIN" "android.intent.action.MY_PACKAGE_REPLACED" "android.intent.action.USER_PRESENT" "android.intent.action.VIEW" "com.fb.iwidget.action.SHOULD_REVIVE"
Intents Categories	"android.intent.category.BROWSABLE" "android.intent.category.DEFAULT" "android.intent.category.LAUNCHER"
Permissions	"android.permission.ACCESS_NETWORK_STATE" "android.permission.CALL_PHONE" "android.permission.INTERNET" "android.permission.RECEIVE_BOOT_COMPLETED" "android.permission.SYSTEM_ALERT_WINDOW" "com.android.vending.BILLING" "android.permission.BIND_ACCESSIBILITY_SERVICE"
Meta-Data	"android.accessibilityservice" "android.appwidget.provider"
#Icons	331
#Pictures	0
#Videos	0
Audio files	0
Videos	0
Size of the App	4.2M

Dynamic analysis

For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:

Memory: Memory features define activities performed by malware by utilizing memory.
API: Application Programming Interface (API) features delineate the communication between two applications.
Network: Network features describe the data transmitted and received between other devices in the network. It indicates foreground and background * network usage.
Battery: Batt...

m
Android Malware and Normal permissions dataset
data.mendeley.com
impactcybertrust.org
Updated Mar 13, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2018). Android Malware and Normal permissions dataset [Dataset]. http://doi.org/10.17632/958wvr38gy.1
Explore at:
Unique identifier
https://doi.org/10.17632/958wvr38gy.1
Dataset updated
Mar 13, 2018
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.
P
CIC-AndMal2017 Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CIC-AndMal2017 Dataset [Dataset]. https://paperswithcode.com/dataset/cic-andmal2017
Explore at:
Description
Collected more than 10,854 samples (4,354 malware and 6,500 benign) from several sources.
安卓恶意家族数据集(AMFD,Android Malware Family Dataset)
zenodo.org
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LI; LI (2022). 安卓恶意家族数据集(AMFD,Android Malware Family Dataset) [Dataset]. http://doi.org/10.5281/zenodo.6395481
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6395481
Dataset updated
Mar 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
LI; LI
Description
AMFD(Android Malware Family Dataset)

12570 malware samples, 32 malware Family
Android Botnet dataset (01/01/2014 to 01/01/2014)
search.datacite.org
impactcybertrust.org
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2018). Android Botnet dataset (01/01/2014 to 01/01/2014) [Dataset]. http://doi.org/10.23721/100/1478796
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478796
Dataset updated
2018
Dataset provided by
DataCitehttps://www.datacite.org/
IMPACT
Authors
External Data Source
Description
The accumulated dataset combines some botnet samples from the Android Genome Malware project, malware security blog, VirusTotal and samples provided by well-known anti-malware vendor. Overall, the dataset includes 1929 samples spawning a period of 2010 (the first appearance of Android botnet) to 2014. The Android Botnet dataset consists of 14 families: Family, Year of discovery, No. of samples

AnserverBot, 2011, 244 Bmaster, 2012, 6 DroidDream, 2011, 363 Geinimi, 2010, 264 MisoSMS, 2013, 100 NickySpy, 2011, 199 Not Compatible, 2014, 76 PJapps, 2011, 244 Pletor, 2014, 85 RootSmart, 2012, 28 Sandroid, 2014, 44 TigerBot, 2012, 96 Wroba, 2014, 100 Zitmo, 2010, 80 ; cic@unb.ca.
f
Android Process Memory String Dumps Dataset
su.figshare.com
researchdata.se
zip
Updated May 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irvin Homem; Panagiotis Papapetrou (2017). Android Process Memory String Dumps Dataset [Dataset]. http://doi.org/10.17045/sthlmuni.4989773.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17045/sthlmuni.4989773.v1
Dataset updated
May 11, 2017
Dataset provided by
Stockholm University
Authors
Irvin Homem; Panagiotis Papapetrou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset containing 2375 samples of Android Process Memory String Dumps. The dataset is broadly composed of 2 classes: "Benign App" Memory Dumps and "Malicious App" Memory Dumps, respectively, split into 2 ZIP archives. The ZIP archives in total are approximately 17GB in size, however the unzipped contents are approximately 67GB.This dataset is derived from a subset of the APK files originally made freely available for research through the AndroZoo project [1]. The AndroZoo project collected millions of Android applications and scanned them with the VirusTotal online malware scanning service, thereby classifying most of the apps as either malicious or benign at the time of scanning. The process memory dumps in this dataset were generated through running the subset of APK files from the AndroZoo dataset in an Android Emulator, capturing the process memory of the individual process and subsequently extracting only the strings from the process memory dump. This was facilitated through building 2 applications: Coriander and AndroMemDumpBeta which facilitate the running of Apps on Android Emulators, and the capturing of process memory respectively. The source code for these software applications is available on Github. The individual samples are labelled with the SHA256 hash filename from the original AndroZoo labeling and the application package names extracted from within the specific APK manifest file. They also contain a time-stamp for when the memory dumping process took place for the specific file. The file extension used is ".dmp" to indicate that the files are memory dumps, however they only contain strings, and thus can be viewed in any simple text editor.A subset of the first 10000 APK files from the original AndroZoo dataset is also included within this dataset. The metadata of these APK files is present in the file "AndroZoo-First-10000" and the 2375 Android Apps that are the main subjects of our dataset are extracted from here..Our dataset is intended to be used in furthering our research related to Machine Learning-based Triage for Android Memory Forensics. It has been made openly available in order to foster opportunities for collaboration with other researchers, to enable validation of research results as well as to enhance the body of knowledge in related areas of research.References:[1]. K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. AndroZoo: Collecting Millions of Android Apps for the Research Community. Mining Software Repositories (MSR) 2016
Data from: LAMD: Context-driven Android Malware Detection and Classification...
zenodo.org
zip
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xingzhi Qian; Xingzhi Qian; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro (2025). LAMD: Context-driven Android Malware Detection and Classification with LLMs [Dataset]. http://doi.org/10.5281/zenodo.14884736
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14884736
Dataset updated
Feb 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xingzhi Qian; Xingzhi Qian; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LAMD

This dataset is published as part of the paper "LAMD: Context-driven Android Malware Detection and Classification with LLMs". It includes the training and testing data, along with the results of testing malware with LAMD.

The dataset folder contains .csv files of training and testing data infomation. The testing data is further split into 3 files representing different time intervals.

The malware_logs folder contains log files for malware samples from testing dataset that were classified correctly by LAMD.
Dataset for the paper Exploring the Use of Static and Dynamic Analysis to...
zenodo.org
application/gzip
Updated Sep 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Handrick; Rodrigo Bonifácio; Rodrigo Bonifácio; Francisco Handrick (2021). Dataset for the paper Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification [Dataset]. http://doi.org/10.5281/zenodo.5503887
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5503887
Dataset updated
Sep 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francisco Handrick; Rodrigo Bonifácio; Rodrigo Bonifácio; Francisco Handrick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Short Description: This is the dataset for the paper "Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification", accepted for publication in the Journal of Systems and Software.

Link to this repository: https://github.com/droidxp/paper-replication-package

Authors of the Paper

Francisco Handrick da Costa

Ismael Medeiros

Thales Menezes

João Victor da Silva

Ingrid Lorraine da Silva

Rodrigo Bonifácio

Krishna Narasimhanb

Márcio Ribeiro

Abstract

The popularization of the Android platform and the growing number of Android applications (apps) that manage sensitive data turned the Android ecosystem into an attractive target for malicious software. For this reason, researchers and practitioners have investigated new approaches to address Android's security issues, including techniques that leverage dynamic analysis to mine Android sandboxes. The mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app. This exploratory phase records all calls to sensitive APIs. Later, we can use this information to (a) prevent calls to other sensitive APIs (those not recorded in the exploratory phase) or (b) run the dynamic analysis tools again in a different version of the app. During this second execution of the fuzzing tools, a warning of possible malicious behavior is raised whenever the new version of the app calls a sensitive API not recorded in the exploratory phase.

The use of a mining sandbox approach is an effective technique for Android malware analysis, as previous research works revealed. Particularly, existing reports present an accuracy of almost 70% in the identification of malicious behavior using dynamic analysis tools to mine android sandboxes. However, although the use of dynamic analysis for mining Android sandboxes has been investigated before, little is known about the potential benefits of combining static analysis with a mining sandbox approach for identifying malicious behavior. Accordingly, in this paper we present the results of two studies that investigate the impact of using static analysis to complement the performance of existing dynamic analysis tools tailored for mining Android sandboxes, in the task of identifying malicious behavior.

In the first study we conduct a non-exact replication of a previous study (hereafter BLL-Study) that compares the performance of test case generation tools for mining Android sandboxes. Differently from the original work, here we isolate the effect of an independent static analysis component (DroidFax) they used to instrument the Android apps in their experiments. This decision was motivated by the fact that DroidFax could have influenced the efficacy of the dynamic analyses tools positively---through the execution of specific static analysis algorithms DroidFax also implements. In our second study, we carried out a new experiment to investigate the efficacy of taint analysis algorithms to complement the mining sandbox approach previously used to identify malicious behavior. To this end, we executed the FlowDroid tool to mine the source-sink flows from benign/malign pairs of Android apps used in previous research work.

Our study brings several findings. For instance, the first study reveals that DroidFax alone (static analysis) can detect 43.75% of the malwares in the BLL-Study dataset, contributing substantially in the performance of the dynamic analysis tools in the BLL-Study. The results of the second study show that taint analysis is also practical to complement the mining sandboxes approach, with a performance similar to that reached by dynamic analysis tools.
Data from: MalCL: Leveraging GAN-Based Generative Replay to Combat...
zenodo.org
bin
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jimin Park; AHyun Ji; Minji Park; Mohammad Saidur Rahman; Mohammad Saidur Rahman; Se Eun Oh; Se Eun Oh; Jimin Park; AHyun Ji; Minji Park (2024). MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification [Dataset]. http://doi.org/10.5281/zenodo.14537891
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14537891
Dataset updated
Dec 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jimin Park; AHyun Ji; Minji Park; Mohammad Saidur Rahman; Mohammad Saidur Rahman; Se Eun Oh; Se Eun Oh; Jimin Park; AHyun Ji; Minji Park
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 20, 2024
Description
These are the two datasets -- EMBER Class and AZ Class to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.

EMBER 2018 dataset
We use the 2018 EMBER dataset, known for its challenging classification tasks, focusing on a subset of 337,035 malicious Windows PE files labeled by the top 100 malware families, each with over 400 samples. Features include file size, PE and COFF header details, DLL characteristics, imported and exported functions, and properties like size and entropy, all computed using the feature hashing trick.

AZ-Class
The AZ-Class dataset contains 285,582 samples from 100 Android malware families, each with at least 200 samples. We extracted Drebin features (Arp et al.2014) from the apps, covering eight categories like hardware access, permissions, API calls, and network addresses.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

MalRadar (2022). The MalRadar Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6451768

The MalRadar Dataset

Explore at:

Dataset updated

Jul 5, 2022

Dataset authored and provided by

MalRadar

Description

Mobile malware detection has attracted massive research effort in our community. A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Essentially, the malware ground truth should be manually verified by security experts, and their malicious behaviors should be carefully labelled. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc.

We make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. For more details, please visit https://malradar.github.io/

The dataset includes the following files:

(1) sample-info.csv

In this file, we list all the detailed information about each sample, including apk file hash, app name, package name, report family, etc.

(2) malradar.zip

We have packaged the malware samples in chunks of 1000 applications: malradar-0, malradar-1, malradar-2, malradar-3. All the apk files name after the file SHA256.

If your papers or articles used our dataset, please include a citation to our paper:

@article{wang2022malradar, title={MalRadar: Demystifying Android Malware in the New Era}, author={Wang, Liu and Wang, Haoyu and He, Ren and Tao, Ran and Meng, Guozhu and Luo, Xiapu and Liu, Xuanzhe}, journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems}, volume={6}, number={2}, pages={1--27}, year={2022}, publisher={ACM New York, NY, USA} }

Clear search

Close search

Google apps

Main menu

The MalRadar Dataset

Development of Android malware worldwide 2016-2020

MH-100K-Dataset

Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android...

Distribution of Android malware 2019

Android Malware Dataset

Context

Acknowledgements

Coronavirus-themed Mobile Malware

Android Benign and Malware Dataset

DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...

Android Malware Family Labeling

[CIC-AndMal-2020] Static-Dynamic Malware analysis

Introduction

Dataset details

Static analysis

Static Features

Dynamic analysis

Android Malware and Normal permissions dataset

CIC-AndMal2017 Dataset

安卓恶意家族数据集(AMFD,Android Malware Family Dataset)

Android Botnet dataset (01/01/2014 to 01/01/2014)

Android Process Memory String Dumps Dataset

Data from: LAMD: Context-driven Android Malware Detection and Classification...

LAMD

Dataset for the paper Exploring the Use of Static and Dynamic Analysis to...

Data from: MalCL: Leveraging GAN-Based Generative Replay to Combat...

The MalRadar Dataset