Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.
More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.
This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.
Any work that uses this dataset should cite the following paper:
Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472
Implementation code can be found here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it’s increasingly becoming a target for malware
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
EMBER2024 Dataset
EMBER2024 is an update to the EMBER2017 and EMBER2018 datasets. It includes raw features and labels for 3.2 million malicious and benign files from 6 different file types (Win32, Win64, .NET, APK, ELF, and PDF). EMBER2024 is meant to allow researchers to explore a variety of common malware analysis classification tasks. The dataset includes 7 types of labels and tags that support malicious/benign detection, malware family classification, malware behavior prediction… See the full description on the dataset page: https://huggingface.co/datasets/joyce8/EMBER2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We created a test .pdf file by combining 25 images of each malignancy and 20 images of each benign conditions.Original resolution of images were extracted from the Asan test dataset and the Edinburgh Dermofit library, and doPDF (http://www.dopdf.com) was used with high resolution mode to create the test pdf. The filenames of the selected images from the Edinburgh dataset was listed in list_edin.txt. (The Edinburgh dataset is a commercial library)https://licensing.edinburgh-innovations.ed.ac.uk/item.php?item=dermofit-image-library
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.
More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.
This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.
Any work that uses this dataset should cite the following paper:
Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472
Implementation code can be found here