7 datasets found

m
Dataset of Malicious and Benign Webpages
data.mendeley.com
Updated May 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1
Explore at:
Unique identifier
https://doi.org/10.17632/gdx3pkwp47.1
Dataset updated
May 1, 2020
Authors
AK Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].
The description of the attributes from the Dimension class in version 1.0 of...
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the Dimension class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225953.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The description of the attributes from the Dimension class in version 1.0 of the CSD model.
The description of the attributes from the DependentVariable class in...
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the DependentVariable class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225953.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The description of the attributes from the DependentVariable class in version 1.0 of the CSD model.
Dataset of Malicious and Benign Webpages
kaggle.com
zip
Updated Apr 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. https://www.kaggle.com/aksingh2411/dataset-of-malicious-and-benign-webpages
Explore at:
zip(996253377 bytes)Available download formats
Dataset updated
Apr 4, 2020
Authors
AK Singh
Description
Context

This dataset has been prepared to carryout classification of webpages as malicious or benign.

Content

The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3].

References

[1] Singh, A. K., and Navneet Goyal. "MalCrawler: A crawler for seeking and crawling malicious websites." In International Conference on Distributed Computing and Internet Technology, pp. 210-223. Springer, Cham, 2017. [2] https://developers.google.com/safe-browsing [3] Singh, A. K., and Navneet Goyal. "A Comparison of Machine Learning Attributes for Detecting Malicious Websites." In 2019 11th International Conference on Communication Systems & Networks (COMSNETS), pp. 352-358. IEEE, 2019.

Inspiration

The dataset seeks to address classification of webpages using machine learning techniques.
kartikmining
zenodo.org
data-staging.niaid.nih.gov
txt, zip
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah; Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah (2020). kartikmining [Dataset]. http://doi.org/10.5281/zenodo.495499
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.495499
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah; Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview of Data

This dataset is a data dump containing data from June 2008 to March 2013. Note that Stack Overflow originated only in June 2008. Therefore, this dump includes all the questions and answers on Stack Overflow until March 2013.

Stack Overflow provides data dumps of all user generated data, including questions asked with the list of answers, the accepted answer per question, up/down votes, favourite counts, post score, comments, and anonymized user reputation. Stack Overflow allows users to tag discussions and has a reputation-based mechanism to rank users based on their active participation and contributions.

Attribute Information

Attribute info the datasets are in xml format including questions and answers for the following topics:

* CSS
* CSS-mobile
* HTML5
* HTML5-mobile
* JavaScript
* Javascript-mobile
A Personalized Activity-based Spatiotemporal Risk Mapping Approach to...
figshare.com
tiff
Updated Mar 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang (2021). A Personalized Activity-based Spatiotemporal Risk Mapping Approach to COVID-19 Pandemic [Dataset]. http://doi.org/10.6084/m9.figshare.13517105.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13517105.v1
Dataset updated
Mar 18, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets used for this manuscript were derived from multiple sources: Denver Public Health, Esri, Google, and SafeGraph. Any reuse or redistribution of the datasets are subjected to the restrictions of the data providers: Denver Public Health, Esri, Google, and SafeGraph and should consult relevant parties for permissions.1. COVID-19 case dataset were retrieved from Denver Public Health (Link: https://storymaps.arcgis.com/stories/50dbb5e7dfb6495292b71b7d8df56d0a )2. Point of Interests (POIs) data were retrieved from Esri and SafeGraph (Link: https://coronavirus-disasterresponse.hub.arcgis.com/datasets/6c8c635b1ea94001a52bf28179d1e32b/data?selectedAttribute=naics_code) and verified with Google Places Service (Link: https://developers.google.com/maps/documentation/javascript/reference/places-service)3. The activity risk information is accessible from Texas Medical Association (TMA) (Link: https://www.texmed.org/TexasMedicineDetail.aspx?id=54216 )The datasets for risk assessment and mapping are included in a geodatabase. Per SafeGraph data sharing guidelines, raw data cannot be shared publicly. To view the content of the geodatabase, users should have installed ArcGIS Pro 2.7. The geodatabase includes the following:1. POI. Major attributes are locations, name, and daily popularity.2. Denver neighborhood with weekly COVID-19 cases and computed regional risk levels.3. Simulated four travel logs with anchor points provided. Each is a separate point layer.
f
Mapping of CSD model attribute values to JSON serialized values.
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). Mapping of CSD model attribute values to JSON serialized values. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225953.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mapping of CSD model attribute values to JSON serialized values.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1

Dataset of Malicious and Benign Webpages

Explore at:

Unique identifier

https://doi.org/10.17632/gdx3pkwp47.1

Dataset updated

May 1, 2020

Authors

AK Singh

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].

Clear search

Close search

Google apps

Main menu

Dataset of Malicious and Benign Webpages

The description of the attributes from the Dimension class in version 1.0 of...

The description of the attributes from the DependentVariable class in...

Dataset of Malicious and Benign Webpages

Context

Content

References

Inspiration

kartikmining

A Personalized Activity-based Spatiotemporal Risk Mapping Approach to...

Mapping of CSD model attribute values to JSON serialized values.

Dataset of Malicious and Benign Webpages