Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.
Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mapping of CSD model attribute values to JSON serialized values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The description of the attributes from the Dimension class in version 1.0 of the CSD model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The description of the attributes from the DependentVariable class in version 1.0 of the CSD model.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.
Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].