4 datasets found
  1. m

    Dataset of Malicious and Benign Webpages

    • data.mendeley.com
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1
    Explore at:
    Dataset updated
    May 1, 2020
    Authors
    AK Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

    Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].

  2. f

    Mapping of CSD model attribute values to JSON serialized values.

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). Mapping of CSD model attribute values to JSON serialized values. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping of CSD model attribute values to JSON serialized values.

  3. The description of the attributes from the Dimension class in version 1.0 of...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the Dimension class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The description of the attributes from the Dimension class in version 1.0 of the CSD model.

  4. The description of the attributes from the DependentVariable class in...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the DependentVariable class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The description of the attributes from the DependentVariable class in version 1.0 of the CSD model.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1

Dataset of Malicious and Benign Webpages

Explore at:
Dataset updated
May 1, 2020
Authors
AK Singh
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].

Search
Clear search
Close search
Google apps
Main menu