15 datasets found
  1. P

    CVEfixes Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guru Prasad Bhandari; Amara Naseer; Leon Moonen (2021). CVEfixes Dataset [Dataset]. https://paperswithcode.com/dataset/cvefixes
    Explore at:
    Dataset updated
    Jul 18, 2021
    Authors
    Guru Prasad Bhandari; Amara Naseer; Leon Moonen
    Description

    CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

    At the initial release, the dataset covers all published CVEs up to 9 June 2021. All open-source projects that were reported in CVE records in the NVD in this time frame and had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 5495 vulnerability fixing commits in 1754 open source projects for a total of 5365 CVEs in 180 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after fixing of 18249 files, and 50322 functions.

  2. Z

    MoreFixes: Largest CVE dataset with fixes

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GADYATSKAYA, Olga (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Rahim Nouri, Sajad
    Rietveld, Kristian F. D.
    Akhoundali, Jafar
    GADYATSKAYA, Olga
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

    Please use this for citation:

     title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
     author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
     booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
     pages={42--51},
     year={2024}
    }
    
  3. Ground Truth VCCs for USENIX 2022 paper

    • figshare.com
    txt
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N A; Manuel Brack (2022). Ground Truth VCCs for USENIX 2022 paper [Dataset]. http://doi.org/10.6084/m9.figshare.19779727.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 27, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    N A; Manuel Brack
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary ground truth dataset of mappings between Fixing commits VCCs and CVEs. USENIX 2022 paper: https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos

    Credits go to the Ubuntu Security team and the Vulnerability History Project.

  4. h

    vulrepair

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YAM, vulrepair [Dataset]. https://huggingface.co/datasets/nus-yam/vulrepair
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    YAM
    Description

    A clean union of BigVul and CVE-Fixes.

  5. w

    Websites susceptible to CVE-2025-30797

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-30797 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-30797
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-30797, compiled through global website indexing conducted by WebTechSurvey.

  6. Z

    FixMe: An Incremental Lightweight Method for Vulnerability Data Collection...

    • data.niaid.nih.gov
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhandari, Guru Prasad (2024). FixMe: An Incremental Lightweight Method for Vulnerability Data Collection for Security Patch Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10955341
    Explore at:
    Dataset updated
    May 31, 2024
    Dataset authored and provided by
    Bhandari, Guru Prasad
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This repository has the FixMe dataset and the source code for extracting the new dataset. is a lightweight approach for collecting code patches based on analyzing the commits of various version control systems. The practical framework is designed to generate patches across a wide array of programming languages. This open-source tool streamlines the process of gathering vulnerability records from the Common Vulnerabilities and Exposures (CVE) database through an incremental approach. By embracing an incremental methodology, we expedite the acquisition of data, ensuring the inclusion of newly identified vulnerabilities and their corresponding patch pairs. Our methodology involves extracting security issues, obtaining vulnerability-fixing commits, and retrieving relevant source code from various projects. The extracted dataset by the FixMe tool supports for the automated patch prediction, automated program repair, commit classification, vulnerability prediction and more.

  7. w

    Websites susceptible to CVE-2025-21984

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-21984 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21984
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-21984, compiled through global website indexing conducted by WebTechSurvey.

  8. h

    CVEV5Errors

    • huggingface.co
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cvelist (2024). CVEV5Errors [Dataset]. https://huggingface.co/datasets/cvelist/CVEV5Errors
    Explore at:
    Dataset updated
    Jun 24, 2024
    Dataset authored and provided by
    cvelist
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    cvelint

    CVE records in the v5 JSON schema may include errors that are neither enforceable by a schema, nor validated on the backend in CVE Services when a CVE record is created/updated. This CLI tool aims to validate CVE records for such errors so they can be fixed, and changes to the CVE schema can be made based on these findings.

      Installation
    
    
    
    
    
      Binary Releases
    

    For Linux, macOS, or Windows, you can download a binary release here.

      Build from Source
    

    $… See the full description on the dataset page: https://huggingface.co/datasets/cvelist/CVEV5Errors.

  9. T

    Cape Verde Gross Fixed Capital Formation

    • tradingeconomics.com
    • tr.tradingeconomics.com
    • +14more
    csv, excel, json, xml
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2024). Cape Verde Gross Fixed Capital Formation [Dataset]. https://tradingeconomics.com/cape-verde/gross-fixed-capital-formation
    Explore at:
    json, excel, csv, xmlAvailable download formats
    Dataset updated
    Dec 15, 2024
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 31, 2007 - Dec 31, 2024
    Area covered
    Cabo Verde
    Description

    Gross Fixed Capital Formation in Cape Verde decreased to 11541.40 CVE Million in the fourth quarter of 2024 from 14858.40 CVE Million in the third quarter of 2024. This dataset provides - Cape Verde Gross Fixed Capital Formation- actual values, historical data, forecast, chart, statistics, economic calendar and news.

  10. w

    Websites susceptible to CVE-2025-21948

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-21948 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21948
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-21948, compiled through global website indexing conducted by WebTechSurvey.

  11. h

    CWE-Bench-Java

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IRIS-SAST, CWE-Bench-Java [Dataset]. https://huggingface.co/datasets/iris-sast/CWE-Bench-Java
    Explore at:
    Dataset authored and provided by
    IRIS-SAST
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CWE-Bench-Java

    This repository contains the dataset CWE-Bench-Java presented in the paper LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. At a high level, this dataset contains 120 CVEs spanning 4 CWEs, namely path-traversal, OS-command injection, cross-site scripting, and code-injection. Each CVE includes the buggy and fixed source code of the project, along with the information of the fixed files and functions. We provide the seed information for each CVE in… See the full description on the dataset page: https://huggingface.co/datasets/iris-sast/CWE-Bench-Java.

  12. f

    CVE values determined by 10-fold CV and our approximation formula against...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima (2023). CVE values determined by 10-fold CV and our approximation formula against λT. [Dataset]. http://doi.org/10.1371/journal.pone.0188012.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    λℓ1 is fixed to the optimal value (2λℓ1/M = 1, coincidentally common to all cases). The number in brackets denotes the error bar to the last digits. The optimal values are bolded. The tuning constants δ and θ are set to be δ = 10−4 and θ = 10−12, respectively.

  13. w

    Websites susceptible to CVE-2020-15839

    • webtechsurvey.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites susceptible to CVE-2020-15839 [Dataset]. https://webtechsurvey.com/cve/CVE-2020-15839
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2020-15839, compiled through global website indexing conducted by WebTechSurvey.

  14. m

    Predicting Vulnerability Inducing Function Versions Using Node Embeddings...

    • data.mendeley.com
    Updated Jan 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sefa Eren Şahin (2022). Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks [Dataset]. http://doi.org/10.17632/ymtf9znmfz.1
    Explore at:
    Dataset updated
    Jan 10, 2022
    Authors
    Sefa Eren Şahin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wireshark Vulnerability Prediction Dataset

    This dataset is constructed by a team of researchers in Istanbul Techical University Faculty of Computer and Informatics, and used in the paper entitled as "Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks". Please see the GitHub repository https://github.com/erensahin/gnn-vulnerability-prediction for more details on usage.

    This dataset consists of two main parts: * AST dumps which can be used as inputs for any Machine Learning model. (ast_input) * Wireshark file changes and bugs (file_changes_and_bugs)

    ast_input

    asp_input folder contains three files:

    • ast_input.zip: This file is a compressed version of AST dumps in Python pickle format. You should use python pickle library to unpickle and use the data.
    • node_embeddings_by_kind.pkl: Embedding vectors corresponding to AST node kinds in python pickle format.
    • token_id_vocabulary.pkl: Map of token ids and their corresponding tokens in python pickle format.

    file_changes_and_bugs

    file_changes_and_bugs folder consists of five files:

    • wireshark_file_changes.csv: list of file changes made in wireshark repository. file changes are basicly commit-file pairs.
    • wireshark_cve_bug_matching.csv: this entity maps CVE entries to bug ids in wireshark bug repository. This is scraped from https://www.wireshark.org/security/
    • additional_bugs.csv: additional security related bugs that our team manually identified by investigating security advisories and bug reports.
    • wireshark_bug_commit_matching.csv: this entity maps security bugs (vulnerabilities) to commits in wireshark source code repositry.
    • wireshark_bug_inducing_file_changes.csv: this entity maps vulnerabilities in wireshark source files in terms of in which commit a vulnerability is induced and fixed.
  15. w

    Websites susceptible to CVE-2022-45320

    • webtechsurvey.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites susceptible to CVE-2022-45320 [Dataset]. https://webtechsurvey.com/cve/CVE-2022-45320
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2022-45320, compiled through global website indexing conducted by WebTechSurvey.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Guru Prasad Bhandari; Amara Naseer; Leon Moonen (2021). CVEfixes Dataset [Dataset]. https://paperswithcode.com/dataset/cvefixes

CVEfixes Dataset

Explore at:
206 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 18, 2021
Authors
Guru Prasad Bhandari; Amara Naseer; Leon Moonen
Description

CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

At the initial release, the dataset covers all published CVEs up to 9 June 2021. All open-source projects that were reported in CVE records in the NVD in this time frame and had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 5495 vulnerability fixing commits in 1754 open source projects for a total of 5365 CVEs in 180 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after fixing of 18249 files, and 50322 functions.

Search
Clear search
Close search
Google apps
Main menu