4 datasets found

Evasive PDF Samples
kaggle.com
Updated Mar 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fouad Trad (2024). Evasive PDF Samples [Dataset]. https://www.kaggle.com/datasets/fouadtrad2/evasive-pdf-samples/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fouad Trad
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.

More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.

This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.

Any work that uses this dataset should cite the following paper:

Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472

Implementation code can be found here
i
PdfRep
ieee-dataport.org
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Nicholas (2024). PdfRep [Dataset]. https://ieee-dataport.org/documents/pdfrep
Explore at:
Dataset updated
Mar 4, 2024
Authors
Charles Nicholas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
it’s increasingly becoming a target for malware
h
EMBER2024
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert J. Joyce, EMBER2024 [Dataset]. https://huggingface.co/datasets/joyce8/EMBER2024
Explore at:
Authors
Robert J. Joyce
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
EMBER2024 Dataset

EMBER2024 is an update to the EMBER2017 and EMBER2018 datasets. It includes raw features and labels for 3.2 million malicious and benign files from 6 different file types (Win32, Win64, .NET, APK, ELF, and PDF). EMBER2024 is meant to allow researchers to explore a variety of common malware analysis classification tasks. The dataset includes 7 types of labels and tags that support malicious/benign detection, malware family classification, malware behavior prediction… See the full description on the dataset page: https://huggingface.co/datasets/joyce8/EMBER2024.
f
AI vs Dermatologist Test PDF Docoment
figshare.com
txt
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung Seog Han (2022). AI vs Dermatologist Test PDF Docoment [Dataset]. http://doi.org/10.6084/m9.figshare.5592631.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5592631.v2
Dataset updated
Feb 2, 2022
Dataset provided by
figshare
Authors
Seung Seog Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We created a test .pdf file by combining 25 images of each malignancy and 20 images of each benign conditions.Original resolution of images were extracted from the Asan test dataset and the Edinburgh Dermofit library, and doPDF (http://www.dopdf.com) was used with high resolution mode to create the test pdf. The filenames of the selected images from the Edinburgh dataset was listed in list_edin.txt. (The Edinburgh dataset is a commercial library)https://licensing.edinburgh-innovations.ed.ac.uk/item.php?item=dermofit-image-library
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Fouad Trad (2024). Evasive PDF Samples [Dataset]. https://www.kaggle.com/datasets/fouadtrad2/evasive-pdf-samples/data

Evasive PDF Samples

A dataset used to evaluate the robustness of trained PDF malware classifiers

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 9, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Fouad Trad

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.

More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.

This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.

Any work that uses this dataset should cite the following paper:

Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472

Implementation code can be found here

Clear search

Close search

Google apps

Main menu

Evasive PDF Samples

PdfRep

EMBER2024

AI vs Dermatologist Test PDF Docoment

Evasive PDF Samples

A dataset used to evaluate the robustness of trained PDF malware classifiers