19 datasets found
  1. Z

    The MalRadar Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MalRadar (2022). The MalRadar Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6451768
    Explore at:
    Dataset updated
    Jul 5, 2022
    Dataset authored and provided by
    MalRadar
    Description

    Mobile malware detection has attracted massive research effort in our community. A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Essentially, the malware ground truth should be manually verified by security experts, and their malicious behaviors should be carefully labelled. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc.

    We make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. For more details, please visit https://malradar.github.io/

    The dataset includes the following files:

    (1) sample-info.csv

    In this file, we list all the detailed information about each sample, including apk file hash, app name, package name, report family, etc.

    (2) malradar.zip

    We have packaged the malware samples in chunks of 1000 applications: malradar-0, malradar-1, malradar-2, malradar-3. All the apk files name after the file SHA256.

    If your papers or articles used our dataset, please include a citation to our paper:

    @article{wang2022malradar, title={MalRadar: Demystifying Android Malware in the New Era}, author={Wang, Liu and Wang, Haoyu and He, Ren and Tao, Ran and Meng, Guozhu and Luo, Xiapu and Liu, Xuanzhe}, journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems}, volume={6}, number={2}, pages={1--27}, year={2022}, publisher={ACM New York, NY, USA} }

  2. Development of Android malware worldwide 2016-2020

    • statista.com
    Updated Jul 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Development of Android malware worldwide 2016-2020 [Dataset]. https://www.statista.com/statistics/680705/global-android-malware-volume/
    Explore at:
    Dataset updated
    Jul 7, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2013 - Mar 2020
    Area covered
    Worldwide
    Description

    As of March 2020, the total number of new Android malware samples amounted to 482,579 per month. According to AV-Test, trojans were the most common type of malware affecting Android devices.

  3. MH-100K-Dataset

    • figshare.com
    zip
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova (2023). MH-100K-Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24328885.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vanderson Rocha; Hendrio Bragança; Eduardo Feitosa; Eduardo Souto; Diego Kreutz; Lucas Vilanova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset MH-100K, an extensive collection of Android malware information comprising 101,975 samples. It encompasses a main CSV file with valuable metadata, including the SHA256 hash (APK’s signature), file name, package name, Android’s official compilation API, 166 permissions, 24,417 API calls, and 250 intents.

  4. Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android...

    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic (2020). Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" [Dataset]. http://doi.org/10.5281/zenodo.1420449
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic
    Description

    Protection against ransomware is particularly relevant in systems running the Android operating system, due to its huge users' base and, therefore, its potential for monetization from the attackers. In "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" (see references for details), we describe a hybrid (static + dynamic) malware detection method that has extremely good accuracy (100% detection rate, with false positive below 4%).

    We release a dataset related to the dynamic detection part of the aforementioned methods and containing execution traces of ransomware Android applications, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 666 ransomware applications taken from the Heldroid project [https://github.com/necst/heldroid] (the app repository is unavailable at the moment). Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 20,000 stimuli were applied with a maximum execution time of 15 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

    In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

    • ransom-per_app-csv.zip - features obtained by executing ransomware applications, one CSV per application

    • ransom-unified-csv.zip - features obtained by executing ransomware applications, only one CSV file

  5. Distribution of Android malware 2019

    • statista.com
    Updated Jul 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Distribution of Android malware 2019 [Dataset]. https://www.statista.com/statistics/681006/share-of-android-types-of-malware/
    Explore at:
    Dataset updated
    Jul 7, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2019
    Area covered
    Worldwide
    Description

    According to AV-Test, trojans were found to be the preferred means for cybercriminals to infiltrate Android systems. In 2019, trojans accounted for 93.93 percent of all malware attacks on Android systems. Ransomware ranked second, with 2.47 percent of Android malware samples involving this variant.

  6. Android Malware Dataset

    • kaggle.com
    zip
    Updated Jun 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2021). Android Malware Dataset [Dataset]. https://www.kaggle.com/saurabhshahane/android-malware-dataset
    Explore at:
    zip(4152702 bytes)Available download formats
    Dataset updated
    Jun 4, 2021
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    A dataset of metainformation of benign and malware Android samples used in the paper Martín, A., Calleja, A., Menéndez, H. D., Tapiador, J., & Camacho, D. (2016, December). ADROIT: Android malware detection using meta-information. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on (pp. 1-8). IEEE.

    Acknowledgements

    Martín, Alejandro; Calleja, Alejandro; Menéndez, Héctor D.; Tapiador, Juan; Camacho, David (2017), “ADROIT”, Mendeley Data, V2, doi: 10.17632/yr92xbrvgx.2

  7. O

    Coronavirus-themed Mobile Malware

    • opendatalab.com
    • paperswithcode.com
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research Asia, Coronavirus-themed Mobile Malware [Dataset]. https://opendatalab.com/OpenDataLab/Coronavirus-themed_Mobile_etc
    Explore at:
    zipAvailable download formats
    Dataset provided by
    Zhejiang University
    Microsoft Research Asia
    Beijing University of Posts and Telecommunications
    Description

    This is a dataset for coronavirus-themed malware for Android devices. It is a daily growing COVID-19 themed mobile app dataset, which contains 4,322 COVID-19 themed apk samples (2,500 unique apps) and 611 potential malware samples (370 unique malicious apps) by the time of mid-November, 2020.

  8. m

    Android Benign and Malware Dataset

    • data.mendeley.com
    Updated Mar 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2024). Android Benign and Malware Dataset [Dataset]. http://doi.org/10.17632/rvjptkrc34.1
    Explore at:
    Dataset updated
    Mar 6, 2024
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains permission data set extracted from different .apk files downloaded from thirty repositories. It contain more than 1500 permissions. Apps belong to thirty different categories.

  9. Z

    DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mercaldo, Francesco (2020). DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1296277
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Malek, Miroslaw
    Milosevic, Jelena
    Ferrante, Alberto
    Mercaldo, Francesco
    Description

    Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.

    Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.

    Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.

    We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

    In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

    benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application

    benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file

    malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application

    malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file

  10. Android Malware Family Labeling

    • zenodo.org
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang; Wang (2024). Android Malware Family Labeling [Dataset]. http://doi.org/10.5281/zenodo.13790832
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wang; Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository includes VirusTotal scan reports of samples of two datasets used in the paper.

    (1) GPset-VT-Reports.zip: the first and second VT scans for the GPset samples.

    (2) 3rdset-VT-Reports.zip: the first and second VT scans for the 3rdset samples.

    (3)FusedApp-VT-Reports.zip: the VT scan for the fused (multi-family) samples.

    Each scan report is a JSON file generated by VirusTotal, including file metadata (e.g., hashes, size), certificate metadata (e.g., thumbprint), VirusTotal-specific data (e.g., submission time), and a list of detection labels assigned by various antivirus engines used to scan the file.

  11. [CIC-AndMal-2020] Static-Dynamic Malware analysis

    • kaggle.com
    Updated Dec 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Zorzetto (2021). [CIC-AndMal-2020] Static-Dynamic Malware analysis [Dataset]. https://www.kaggle.com/datasets/albertozorzetto/cic-andmal-2020-dynamic-static-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alberto Zorzetto
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    This dataset contains 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms,**trojan-spy** and zero-day.

    A complete taxonomy of all the malware families of captured malware apps is created by dividing them into 8 categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings.

    Dataset details

    CategoryNumber of familiesNumber of samples
    Adware4847,210
    Backdoor111,538
    File Infector5669
    No Category-2,296
    PUA82,051
    Ransomware86,202
    Riskware2197,349
    Scareware31,556
    Trojan4513,559
    Trojan-Banker11887
    Trojan-Dropper92,302
    Trojan-SMS113,125
    Trojan-Spy113,540
    Zero-day-13,340

    Static analysis

    AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:

    • Activities: An android activity is one screen of the android app's user interface
    • Broadcast receivers and providers
    • Metadata: It is basically an additional option to store information that can be accessed through the entire project
    • The permissions requested by application: It protects the privacy of the user and is needed to access sensitive user data (such as contacts and SMS)
    • System features (such as camera and internet)

    Static Features

    FeatureValues
    Package Name"com.fb.iwidget"
    Activities"com.fb.iwidget.OverlayActivity"
    "org.acra.CrashReportDialog"
    "com.batch.android.BatchActionActivity"
    "com.fb.iwidget.MainActivity"
    "com.fb.iwidget.PreferencesActivity"
    "com.fb.iwidget.PickerActivity"
    "com.fb.iwidget.IntroActivity"
    Services"com.batch.android.BatchActionService"
    "com.fb.iwidget.MainService"
    "com.fb.iwidget.SnapAccessService"
    Receivers/Providers"com.fb.iwidget.ExpandWidgetProvider"
    "com.fb.iwidget.ActionReceiver"
    Intents Actions"android.accessibilityservice.AccessibilityService"
    "android.appwidget.action.APPWIDGET_UPDATE"
    "android.intent.action.BOOT_COMPLETED"
    "android.intent.action.CREATE_SHORTCUT"
    "android.intent.action.MAIN"
    "android.intent.action.MY_PACKAGE_REPLACED"
    "android.intent.action.USER_PRESENT"
    "android.intent.action.VIEW"
    "com.fb.iwidget.action.SHOULD_REVIVE"
    Intents Categories"android.intent.category.BROWSABLE"
    "android.intent.category.DEFAULT"
    "android.intent.category.LAUNCHER"
    Permissions"android.permission.ACCESS_NETWORK_STATE"
    "android.permission.CALL_PHONE"
    "android.permission.INTERNET"
    "android.permission.RECEIVE_BOOT_COMPLETED"
    "android.permission.SYSTEM_ALERT_WINDOW"
    "com.android.vending.BILLING"
    "android.permission.BIND_ACCESSIBILITY_SERVICE"
    Meta-Data"android.accessibilityservice"
    "android.appwidget.provider"
    #Icons331
    #Pictures0
    #Videos0
    Audio files0
    Videos0
    Size of the App4.2M

    Dynamic analysis

    For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:

    • Memory: Memory features define activities performed by malware by utilizing memory.
    • API: Application Programming Interface (API) features delineate the communication between two applications.
    • Network: Network features describe the data transmitted and received between other devices in the network. It indicates foreground and background * network usage.
    • Battery: Batt...
  12. m

    Android Malware and Normal permissions dataset

    • data.mendeley.com
    • impactcybertrust.org
    Updated Mar 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2018). Android Malware and Normal permissions dataset [Dataset]. http://doi.org/10.17632/958wvr38gy.1
    Explore at:
    Dataset updated
    Mar 13, 2018
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.

  13. P

    CIC-AndMal2017 Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CIC-AndMal2017 Dataset [Dataset]. https://paperswithcode.com/dataset/cic-andmal2017
    Explore at:
    Description

    Collected more than 10,854 samples (4,354 malware and 6,500 benign) from several sources.

  14. 安卓恶意家族数据集(AMFD,Android Malware Family Dataset)

    • zenodo.org
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LI; LI (2022). 安卓恶意家族数据集(AMFD,Android Malware Family Dataset) [Dataset]. http://doi.org/10.5281/zenodo.6395481
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    LI; LI
    Description

    AMFD(Android Malware Family Dataset)

    12570 malware samples, 32 malware Family

  15. Android Botnet dataset (01/01/2014 to 01/01/2014)

    • search.datacite.org
    • impactcybertrust.org
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2018). Android Botnet dataset (01/01/2014 to 01/01/2014) [Dataset]. http://doi.org/10.23721/100/1478796
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    IMPACT
    Authors
    External Data Source
    Description

    The accumulated dataset combines some botnet samples from the Android Genome Malware project, malware security blog, VirusTotal and samples provided by well-known anti-malware vendor. Overall, the dataset includes 1929 samples spawning a period of 2010 (the first appearance of Android botnet) to 2014. The Android Botnet dataset consists of 14 families: Family, Year of discovery, No. of samples

    AnserverBot, 2011, 244 Bmaster, 2012, 6 DroidDream, 2011, 363 Geinimi, 2010, 264 MisoSMS, 2013, 100 NickySpy, 2011, 199 Not Compatible, 2014, 76 PJapps, 2011, 244 Pletor, 2014, 85 RootSmart, 2012, 28 Sandroid, 2014, 44 TigerBot, 2012, 96 Wroba, 2014, 100 Zitmo, 2010, 80 ; cic@unb.ca.

  16. f

    Android Process Memory String Dumps Dataset

    • su.figshare.com
    • researchdata.se
    zip
    Updated May 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irvin Homem; Panagiotis Papapetrou (2017). Android Process Memory String Dumps Dataset [Dataset]. http://doi.org/10.17045/sthlmuni.4989773.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 11, 2017
    Dataset provided by
    Stockholm University
    Authors
    Irvin Homem; Panagiotis Papapetrou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset containing 2375 samples of Android Process Memory String Dumps. The dataset is broadly composed of 2 classes: "Benign App" Memory Dumps and "Malicious App" Memory Dumps, respectively, split into 2 ZIP archives. The ZIP archives in total are approximately 17GB in size, however the unzipped contents are approximately 67GB.This dataset is derived from a subset of the APK files originally made freely available for research through the AndroZoo project [1]. The AndroZoo project collected millions of Android applications and scanned them with the VirusTotal online malware scanning service, thereby classifying most of the apps as either malicious or benign at the time of scanning. The process memory dumps in this dataset were generated through running the subset of APK files from the AndroZoo dataset in an Android Emulator, capturing the process memory of the individual process and subsequently extracting only the strings from the process memory dump. This was facilitated through building 2 applications: Coriander and AndroMemDumpBeta which facilitate the running of Apps on Android Emulators, and the capturing of process memory respectively. The source code for these software applications is available on Github. The individual samples are labelled with the SHA256 hash filename from the original AndroZoo labeling and the application package names extracted from within the specific APK manifest file. They also contain a time-stamp for when the memory dumping process took place for the specific file. The file extension used is ".dmp" to indicate that the files are memory dumps, however they only contain strings, and thus can be viewed in any simple text editor.A subset of the first 10000 APK files from the original AndroZoo dataset is also included within this dataset. The metadata of these APK files is present in the file "AndroZoo-First-10000" and the 2375 Android Apps that are the main subjects of our dataset are extracted from here..Our dataset is intended to be used in furthering our research related to Machine Learning-based Triage for Android Memory Forensics. It has been made openly available in order to foster opportunities for collaboration with other researchers, to enable validation of research results as well as to enhance the body of knowledge in related areas of research.References:[1]. K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. AndroZoo: Collecting Millions of Android Apps for the Research Community. Mining Software Repositories (MSR) 2016

  17. Data from: LAMD: Context-driven Android Malware Detection and Classification...

    • zenodo.org
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xingzhi Qian; Xingzhi Qian; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro (2025). LAMD: Context-driven Android Malware Detection and Classification with LLMs [Dataset]. http://doi.org/10.5281/zenodo.14884736
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xingzhi Qian; Xingzhi Qian; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro; Xinran Zheng; Yiling He; Shuo Yang; Lorenzo Cavallaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LAMD

    This dataset is published as part of the paper "LAMD: Context-driven Android Malware Detection and Classification with LLMs". It includes the training and testing data, along with the results of testing malware with LAMD.

    • The dataset folder contains .csv files of training and testing data infomation. The testing data is further split into 3 files representing different time intervals.
    • The malware_logs folder contains log files for malware samples from testing dataset that were classified correctly by LAMD.

  18. Dataset for the paper Exploring the Use of Static and Dynamic Analysis to...

    • zenodo.org
    application/gzip
    Updated Sep 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Handrick; Rodrigo Bonifácio; Rodrigo Bonifácio; Francisco Handrick (2021). Dataset for the paper Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification [Dataset]. http://doi.org/10.5281/zenodo.5503887
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francisco Handrick; Rodrigo Bonifácio; Rodrigo Bonifácio; Francisco Handrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Short Description: This is the dataset for the paper "Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification", accepted for publication in the Journal of Systems and Software.

    Link to this repository: https://github.com/droidxp/paper-replication-package

    Authors of the Paper

    • Francisco Handrick da Costa
    • Ismael Medeiros
    • Thales Menezes
    • João Victor da Silva
    • Ingrid Lorraine da Silva
    • Rodrigo Bonifácio
    • Krishna Narasimhanb
    • Márcio Ribeiro

    Abstract

    The popularization of the Android platform and the growing number of Android applications (apps) that manage sensitive data turned the Android ecosystem into an attractive target for malicious software. For this reason, researchers and practitioners have investigated new approaches to address Android's security issues, including techniques that leverage dynamic analysis to mine Android sandboxes. The mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app. This exploratory phase records all calls to sensitive APIs. Later, we can use this information to (a) prevent calls to other sensitive APIs (those not recorded in the exploratory phase) or (b) run the dynamic analysis tools again in a different version of the app. During this second execution of the fuzzing tools, a warning of possible malicious behavior is raised whenever the new version of the app calls a sensitive API not recorded in the exploratory phase.

    The use of a mining sandbox approach is an effective technique for Android malware analysis, as previous research works revealed. Particularly, existing reports present an accuracy of almost 70% in the identification of malicious behavior using dynamic analysis tools to mine android sandboxes. However, although the use of dynamic analysis for mining Android sandboxes has been investigated before, little is known about the potential benefits of combining static analysis with a mining sandbox approach for identifying malicious behavior. Accordingly, in this paper we present the results of two studies that investigate the impact of using static analysis to complement the performance of existing dynamic analysis tools tailored for mining Android sandboxes, in the task of identifying malicious behavior.

    In the first study we conduct a non-exact replication of a previous study (hereafter BLL-Study) that compares the performance of test case generation tools for mining Android sandboxes. Differently from the original work, here we isolate the effect of an independent static analysis component (DroidFax) they used to instrument the Android apps in their experiments. This decision was motivated by the fact that DroidFax could have influenced the efficacy of the dynamic analyses tools positively---through the execution of specific static analysis algorithms DroidFax also implements. In our second study, we carried out a new experiment to investigate the efficacy of taint analysis algorithms to complement the mining sandbox approach previously used to identify malicious behavior. To this end, we executed the FlowDroid tool to mine the source-sink flows from benign/malign pairs of Android apps used in previous research work.

    Our study brings several findings. For instance, the first study reveals that DroidFax alone (static analysis) can detect 43.75% of the malwares in the BLL-Study dataset, contributing substantially in the performance of the dynamic analysis tools in the BLL-Study. The results of the second study show that taint analysis is also practical to complement the mining sandboxes approach, with a performance similar to that reached by dynamic analysis tools.

  19. Data from: MalCL: Leveraging GAN-Based Generative Replay to Combat...

    • zenodo.org
    bin
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimin Park; AHyun Ji; Minji Park; Mohammad Saidur Rahman; Mohammad Saidur Rahman; Se Eun Oh; Se Eun Oh; Jimin Park; AHyun Ji; Minji Park (2024). MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification [Dataset]. http://doi.org/10.5281/zenodo.14537891
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jimin Park; AHyun Ji; Minji Park; Mohammad Saidur Rahman; Mohammad Saidur Rahman; Se Eun Oh; Se Eun Oh; Jimin Park; AHyun Ji; Minji Park
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 20, 2024
    Description

    These are the two datasets -- EMBER Class and AZ Class to reproduce the results of the paper ``MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification", accepted to be published at the The 39th Annual AAAI Conference on Artificial Intelligence (AAAI) 2025.

    • EMBER 2018 dataset
      We use the 2018 EMBER dataset, known for its challenging classification tasks, focusing on a subset of 337,035 malicious Windows PE files labeled by the top 100 malware families, each with over 400 samples. Features include file size, PE and COFF header details, DLL characteristics, imported and exported functions, and properties like size and entropy, all computed using the feature hashing trick.

    • AZ-Class
      The AZ-Class dataset contains 285,582 samples from 100 Android malware families, each with at least 200 samples. We extracted Drebin features (Arp et al.2014) from the apps, covering eight categories like hardware access, permissions, API calls, and network addresses.
  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MalRadar (2022). The MalRadar Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6451768

The MalRadar Dataset

Explore at:
Dataset updated
Jul 5, 2022
Dataset authored and provided by
MalRadar
Description

Mobile malware detection has attracted massive research effort in our community. A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Essentially, the malware ground truth should be manually verified by security experts, and their malicious behaviors should be carefully labelled. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc.

We make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. For more details, please visit https://malradar.github.io/

The dataset includes the following files:

(1) sample-info.csv

In this file, we list all the detailed information about each sample, including apk file hash, app name, package name, report family, etc.

(2) malradar.zip

We have packaged the malware samples in chunks of 1000 applications: malradar-0, malradar-1, malradar-2, malradar-3. All the apk files name after the file SHA256.

If your papers or articles used our dataset, please include a citation to our paper:

@article{wang2022malradar, title={MalRadar: Demystifying Android Malware in the New Era}, author={Wang, Liu and Wang, Haoyu and He, Ren and Tao, Ran and Meng, Guozhu and Luo, Xiapu and Liu, Xuanzhe}, journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems}, volume={6}, number={2}, pages={1--27}, year={2022}, publisher={ACM New York, NY, USA} }

Search
Clear search
Close search
Google apps
Main menu