4 datasets found
  1. Evasive PDF Samples

    • kaggle.com
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fouad Trad (2024). Evasive PDF Samples [Dataset]. https://www.kaggle.com/datasets/fouadtrad2/evasive-pdf-samples/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Fouad Trad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.

    More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.

    This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.

    Any work that uses this dataset should cite the following paper:

    Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472

    Implementation code can be found here

  2. i

    PdfRep

    • ieee-dataport.org
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Nicholas (2024). PdfRep [Dataset]. https://ieee-dataport.org/documents/pdfrep
    Explore at:
    Dataset updated
    Mar 4, 2024
    Authors
    Charles Nicholas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    it’s increasingly becoming a target for malware

  3. h

    EMBER2024

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert J. Joyce, EMBER2024 [Dataset]. https://huggingface.co/datasets/joyce8/EMBER2024
    Explore at:
    Authors
    Robert J. Joyce
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    EMBER2024 Dataset

    EMBER2024 is an update to the EMBER2017 and EMBER2018 datasets. It includes raw features and labels for 3.2 million malicious and benign files from 6 different file types (Win32, Win64, .NET, APK, ELF, and PDF). EMBER2024 is meant to allow researchers to explore a variety of common malware analysis classification tasks. The dataset includes 7 types of labels and tags that support malicious/benign detection, malware family classification, malware behavior prediction… See the full description on the dataset page: https://huggingface.co/datasets/joyce8/EMBER2024.

  4. f

    AI vs Dermatologist Test PDF Docoment

    • figshare.com
    txt
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seung Seog Han (2022). AI vs Dermatologist Test PDF Docoment [Dataset]. http://doi.org/10.6084/m9.figshare.5592631.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 2, 2022
    Dataset provided by
    figshare
    Authors
    Seung Seog Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We created a test .pdf file by combining 25 images of each malignancy and 20 images of each benign conditions.Original resolution of images were extracted from the Asan test dataset and the Edinburgh Dermofit library, and doPDF (http://www.dopdf.com) was used with high resolution mode to create the test pdf. The filenames of the selected images from the Edinburgh dataset was listed in list_edin.txt. (The Edinburgh dataset is a commercial library)https://licensing.edinburgh-innovations.ed.ac.uk/item.php?item=dermofit-image-library

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fouad Trad (2024). Evasive PDF Samples [Dataset]. https://www.kaggle.com/datasets/fouadtrad2/evasive-pdf-samples/data
Organization logo

Evasive PDF Samples

A dataset used to evaluate the robustness of trained PDF malware classifiers

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fouad Trad
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset is a collection of evasive PDF samples, labeled as malicious (1) or benign (0). Since the dataset has an evasive nature, it can be used to test the robustness of trained PDF malware classifiers against evasion attacks. The dataset contains 500,000 generated evasive samples, including 450,000 malicious and 50,000 benign PDFs.

More details about the data can be found in the publication: Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files.

This resource aims to support researchers and cybersecurity professionals in developing more advanced and robust detection mechanisms for PDF-based malware.

Any work that uses this dataset should cite the following paper:

Trad, F.; Hussein, A.; Chehab, A. Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files. Appl. Sci. 2023, 13, 3472. https://doi.org/10.3390/app13063472

Implementation code can be found here

Search
Clear search
Close search
Google apps
Main menu