9 datasets found
  1. m

    Dataset of Malicious and Benign Webpages

    • data.mendeley.com
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1
    Explore at:
    Dataset updated
    May 1, 2020
    Authors
    AK Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

    Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].

  2. kartikmining

    • zenodo.org
    • data-staging.niaid.nih.gov
    txt, zip
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah; Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah (2020). kartikmining [Dataset]. http://doi.org/10.5281/zenodo.495499
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah; Kartik Bajaj; Karthik Pattabiraman; Ali Mesbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview of Data

    This dataset is a data dump containing data from June 2008 to March 2013. Note that Stack Overflow originated only in June 2008. Therefore, this dump includes all the questions and answers on Stack Overflow until March 2013.

    Stack Overflow provides data dumps of all user generated data, including questions asked with the list of answers, the accepted answer per question, up/down votes, favourite counts, post score, comments, and anonymized user reputation. Stack Overflow allows users to tag discussions and has a reputation-based mechanism to rank users based on their active participation and contributions.

    Attribute Information

    Attribute info the datasets are in xml format including questions and answers for the following topics:

    * CSS
    * CSS-mobile
    * HTML5
    * HTML5-mobile
    * JavaScript
    * Javascript-mobile

  3. A Personalized Activity-based Spatiotemporal Risk Mapping Approach to...

    • figshare.com
    tiff
    Updated Mar 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang (2021). A Personalized Activity-based Spatiotemporal Risk Mapping Approach to COVID-19 Pandemic [Dataset]. http://doi.org/10.6084/m9.figshare.13517105.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Mar 18, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets used for this manuscript were derived from multiple sources: Denver Public Health, Esri, Google, and SafeGraph. Any reuse or redistribution of the datasets are subjected to the restrictions of the data providers: Denver Public Health, Esri, Google, and SafeGraph and should consult relevant parties for permissions.1. COVID-19 case dataset were retrieved from Denver Public Health (Link: https://storymaps.arcgis.com/stories/50dbb5e7dfb6495292b71b7d8df56d0a )2. Point of Interests (POIs) data were retrieved from Esri and SafeGraph (Link: https://coronavirus-disasterresponse.hub.arcgis.com/datasets/6c8c635b1ea94001a52bf28179d1e32b/data?selectedAttribute=naics_code) and verified with Google Places Service (Link: https://developers.google.com/maps/documentation/javascript/reference/places-service)3. The activity risk information is accessible from Texas Medical Association (TMA) (Link: https://www.texmed.org/TexasMedicineDetail.aspx?id=54216 )The datasets for risk assessment and mapping are included in a geodatabase. Per SafeGraph data sharing guidelines, raw data cannot be shared publicly. To view the content of the geodatabase, users should have installed ArcGIS Pro 2.7. The geodatabase includes the following:1. POI. Major attributes are locations, name, and daily popularity.2. Denver neighborhood with weekly COVID-19 cases and computed regional risk levels.3. Simulated four travel logs with anchor points provided. Each is a separate point layer.

  4. Tweets during Nintendo E3 2018 Conference

    • kaggle.com
    zip
    Updated Jun 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier (2018). Tweets during Nintendo E3 2018 Conference [Dataset]. https://www.kaggle.com/xvivancos/tweets-during-nintendo-e3-2018-conference
    Explore at:
    zip(62890418 bytes)Available download formats
    Dataset updated
    Jun 14, 2018
    Authors
    Xavier
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data set containing Tweets captured during the Nintendo E3 2018 Conference.

    Content

    All Twitter APIs that return Tweets provide that data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. The JSON file include the following objects and attributes:

    • Tweet - Tweets are the basic atomic building block of all things Twitter. The Tweet object has a long list of ‘root-level’ attributes, including fundamental attributes such as id, created_at, and text. Tweet child objects include user, entities, and extended_entities. Tweets that are geo-tagged will have a place child object.

      • User - Contains public Twitter account metadata and describes the author of the Tweet with attributes as name, description, followers_count, friends_count, etc.

      • Entities - Provide metadata and additional contextual information about content posted on Twitter. The entities section provides arrays of common things included in Tweets: hashtags, user mentions, links, stock tickers (symbols), Twitter polls, and attached media.

      • Extended Entities - All Tweets with attached photos, videos and animated GIFs will include an extended_entities JSON object.

      • Places - Tweets can be associated with a location, generating a Tweet that has been ‘geo-tagged.’

    More information here.

    Acknowledgements

    I used the filterStream() function to open a connection to Twitter's Streaming API, using the keywords #NintendoE3 and #NintendoDirect. The capture started on Tuesday, June 12th 04:00 am UCT and finished on Tuesday, June 12th 05:00 am UCT.

    Inspiration

    • Time analysis
    • Try text mining!
    • Cross-language differences in Twitter
    • Use this data to produce a sentiment analysis
    • Twitter geolocation
    • Network analysis: graph theory, metrics and properties of the network, community detection, network visualization, etc.
  5. Tweets during Real Madrid vs Liverpool

    • kaggle.com
    zip
    Updated May 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier (2018). Tweets during Real Madrid vs Liverpool [Dataset]. https://www.kaggle.com/xvivancos/tweets-during-r-madrid-vs-liverpool-ucl-2018
    Explore at:
    zip(224380519 bytes)Available download formats
    Dataset updated
    May 26, 2018
    Authors
    Xavier
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data set containing Tweets captured during the 2018 UEFA Champions League Final between Real Madrid and Liverpool.

    Content

    All Twitter APIs that return Tweets provide that data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. The JSON file include the following objects and attributes:

    • Tweet - Tweets are the basic atomic building block of all things Twitter. The Tweet object has a long list of ‘root-level’ attributes, including fundamental attributes such as id, created_at, and text. Tweet child objects include user, entities, and extended_entities. Tweets that are geo-tagged will have a place child object.

      • User - Contains public Twitter account metadata and describes the author of the Tweet with attributes as name, description, followers_count, friends_count, etc.

      • Entities - Provide metadata and additional contextual information about content posted on Twitter. The entities section provides arrays of common things included in Tweets: hashtags, user mentions, links, stock tickers (symbols), Twitter polls, and attached media.

      • Extended Entities - All Tweets with attached photos, videos and animated GIFs will include an extended_entities JSON object.

      • Places - Tweets can be associated with a location, generating a Tweet that has been ‘geo-tagged.’

    More information here.

    Acknowledgements

    I used the filterStream() function to open a connection to Twitter's Streaming API, using the keyword #UCLFinal. The capture started on Saturday, May 27th 6:45 pm UCT (beginning of the match) and finished on Saturday, May 27th 8:45 pm UCT.

    Inspiration

    • Time analysis
    • Try text mining!
    • Cross-language differences in Twitter
    • Use this data to produce a sentiment analysis
    • Twitter geolocation
    • Network analysis: graph theory, metrics and properties of the network, community detection, network visualization, etc.
  6. f

    Mapping of CSD model attribute values to JSON serialized values.

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). Mapping of CSD model attribute values to JSON serialized values. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping of CSD model attribute values to JSON serialized values.

  7. Dataset of Malicious and Benign Webpages

    • kaggle.com
    zip
    Updated Apr 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. https://www.kaggle.com/aksingh2411/dataset-of-malicious-and-benign-webpages
    Explore at:
    zip(996253377 bytes)Available download formats
    Dataset updated
    Apr 4, 2020
    Authors
    AK Singh
    Description

    Context

    This dataset has been prepared to carryout classification of webpages as malicious or benign.

    Content

    The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3].

    References

    [1] Singh, A. K., and Navneet Goyal. "MalCrawler: A crawler for seeking and crawling malicious websites." In International Conference on Distributed Computing and Internet Technology, pp. 210-223. Springer, Cham, 2017. [2] https://developers.google.com/safe-browsing [3] Singh, A. K., and Navneet Goyal. "A Comparison of Machine Learning Attributes for Detecting Malicious Websites." In 2019 11th International Conference on Communication Systems & Networks (COMSNETS), pp. 352-358. IEEE, 2019.

    Inspiration

    The dataset seeks to address classification of webpages using machine learning techniques.

  8. The description of the attributes from the Dimension class in version 1.0 of...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the Dimension class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The description of the attributes from the Dimension class in version 1.0 of the CSD model.

  9. The description of the attributes from the DependentVariable class in...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti (2023). The description of the attributes from the DependentVariable class in version 1.0 of the CSD model. [Dataset]. http://doi.org/10.1371/journal.pone.0225953.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Deepansh J. Srivastava; Thomas Vosegaard; Dominique Massiot; Philip J. Grandinetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The description of the attributes from the DependentVariable class in version 1.0 of the CSD model.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AK Singh (2020). Dataset of Malicious and Benign Webpages [Dataset]. http://doi.org/10.17632/gdx3pkwp47.1

Dataset of Malicious and Benign Webpages

Explore at:
Dataset updated
May 1, 2020
Authors
AK Singh
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains extracted attributes from websites that can be used for Classification of webpages as malicious or benign. The dataset also includes raw page content including JavaScript code that can be used as unstructured data in Deep Learning or for extracting further attributes. The data has been collected by crawling the Internet using MalCrawler [1]. The labels have been verified using the Google Safe Browsing API [2]. Attributes have been selected based on their relevance [3]. The details of dataset attributes is as given below: 'url' - The URL of the webpage. 'ip_add' - IP Address of the webpage. 'geo_loc' - The geographic location where the webpage is hosted. 'url_len' - The length of URL. 'js_len' - Length of JavaScript code on the webpage. 'js_obf_len - Length of obfuscated JavaScript code. 'tld' - The Top Level Domain of the webpage. 'who_is' - Whether the WHO IS domain information is compete or not. 'https' - Whether the site uses https or http. 'content' - The raw webpage content including JavaScript code. 'label' - The class label for benign or malicious webpage.

Python code for extraction of the above listed dataset attributes is attached. The Visualisation of this dataset and it python code is also attached. This visualisation can be seen online on Kaggle [5].

Search
Clear search
Close search
Google apps
Main menu