25 datasets found
  1. Z

    CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jul 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moonen, Leon; Vidziunas, Linas (2024). CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes from Open-Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4476563
    Explore at:
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    Simula Research Laboratory
    Authors
    Moonen, Leon; Vidziunas, Linas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

    This release, v1.0.8, covers all published CVEs up to 23 July 2024. All open-source projects that were reported in CVE records in the NVD in this time frame and had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 12107 vulnerability fixing commits in 4249 open source projects for a total of 11873 CVEs in 272 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after changing 51342 files and 138974 functions. The collection took 48 hours with 4 workers (AMD EPYC Genoa-X 9684X).

    This repository includes the SQL dump of the dataset, as well as the JSON for the CVEs and XML of the CWEs at the time of collection. The complete process has been documented in the paper "CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software", which is published in the Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). You will find a copy of the paper in the Doc folder.

    Citation and Zenodo links

    Please cite this work by referring to the published paper:

    Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). ACM, 10 pages. https://doi.org/10.1145/3475960.3475985

    @inproceedings{bhandari2021:cvefixes, title = {{CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software}}, booktitle = {{Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21)}}, author = {Bhandari, Guru and Naseer, Amara and Moonen, Leon}, year = {2021}, pages = {10}, publisher = {{ACM}}, doi = {10.1145/3475960.3475985}, copyright = {Open Access}, isbn = {978-1-4503-8680-7}, language = {en} }

    The dataset has been released on Zenodo with DOI:10.5281/zenodo.4476563. The GitHub repository containing the code to automatically collect the dataset can be found at https://github.com/secureIT-project/CVEfixes, released with DOI:10.5281/zenodo.5111494.

  2. MoreFixes: Largest CVE dataset with fixes

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jafar Akhoundali; Jafar Akhoundali; Sajad Rahim Nouri; Sajad Rahim Nouri; Kristian F. D. Rietveld; Kristian F. D. Rietveld; Olga GADYATSKAYA; Olga GADYATSKAYA (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. http://doi.org/10.5281/zenodo.11199120
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jafar Akhoundali; Jafar Akhoundali; Sajad Rahim Nouri; Sajad Rahim Nouri; Kristian F. D. Rietveld; Kristian F. D. Rietveld; Olga GADYATSKAYA; Olga GADYATSKAYA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits.
    Our dataset containing 26,617 unique CVEs coming from 6,945 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 31,883 unique commits that fixed those vulnerabilities. Compared to prior work, our dataset brings about a 397% increase in CVEs, a 295% increase in covered open-source projects, and a 480% increase in commit fixes.
    Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 14GB PostgreSQL database that contains information on CVEs up to January 24, 2024, CWEs of each CVE, files and methods changed by each commit, and repository metadata.
    Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    `cvedataset-patches.zip` file contains fix patches, and `dump_morefixes_27-03-2024_19_52_58.sql.zip` contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper
    POSTGRES_DB=postgrescvedumper
    POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

  3. h

    cvefixes

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinayak Menon, cvefixes [Dataset]. https://huggingface.co/datasets/hitoshura25/cvefixes
    Explore at:
    Authors
    Vinayak Menon
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CVEfixes Security Vulnerabilities Dataset

    Security vulnerability data from CVEfixes v1.0.8 with 12,987 vulnerability fix records across 11,726 unique CVEs and 4,205 repositories. Contains CVE metadata (descriptions, CVSS scores, CWE classifications), git commit data, and code diffs showing vulnerable vs fixed code.

      Usage
    

    from datasets import load_dataset

    dataset = load_dataset("hitoshura25/cvefixes")

      Citation
    

    If you use this dataset, please cite the original… See the full description on the dataset page: https://huggingface.co/datasets/hitoshura25/cvefixes.

  4. Ground Truth VCCs for USENIX 2022 paper

    • figshare.com
    txt
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N A; Manuel Brack (2022). Ground Truth VCCs for USENIX 2022 paper [Dataset]. http://doi.org/10.6084/m9.figshare.19779727.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 27, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    N A; Manuel Brack
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary ground truth dataset of mappings between Fixing commits VCCs and CVEs. USENIX 2022 paper: https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos

    Credits go to the Ubuntu Security team and the Vulnerability History Project.

  5. h

    vulrepair

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YAM, vulrepair [Dataset]. https://huggingface.co/datasets/nus-yam/vulrepair
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    YAM
    Description

    A clean union of BigVul and CVE-Fixes.

  6. h

    vulnerability-cwe-patch

    • huggingface.co
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Computer Incident Response Center Luxembourg (2025). vulnerability-cwe-patch [Dataset]. https://huggingface.co/datasets/CIRCL/vulnerability-cwe-patch
    Explore at:
    Dataset updated
    Jul 27, 2025
    Dataset authored and provided by
    Computer Incident Response Center Luxembourg
    Description

    Description

    This dataset, CIRCL/vulnerability-cwe-patch, provides structured real-world vulnerabilities enriched with CWE identifiers and actual patches from platforms like GitHub and GitLab. It was built to support the development of tools for vulnerability classification, triage, and automated repair. Each entry includes metadata such as CVE/GHSA ID, a description, CWE categorization, and links to verified patch commits with associated diff content and commit messages. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/CIRCL/vulnerability-cwe-patch.

  7. w

    Websites susceptible to CVE-2025-21948

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-21948 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21948
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-21948, compiled through global website indexing conducted by WebTechSurvey.

  8. HaPy-Bug – Human Annotated Python Bug Resolution Dataset

    • figshare.com
    txt
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piotr Przymus; Mikołaj Fejzer; Jakub Narębski; Radosław Woźniak; Łukasz Halada; Aleksander Kazecki; Mykhailo Molchanov; Krzysztof Stencel (2025). HaPy-Bug – Human Annotated Python Bug Resolution Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24448663.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Piotr Przymus; Mikołaj Fejzer; Jakub Narębski; Radosław Woźniak; Łukasz Halada; Aleksander Kazecki; Mykhailo Molchanov; Krzysztof Stencel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present HaPy-Bug, a curated dataset of 793 Python source code commits associated with bug fixes, with each line of code annotated by three domain experts. The annotations offer insights into the purpose of modified files, changes at the line level, and reviewers’ confidence levels. We analyze HaPy-Bug to examine the distribution of file purposes, types of modifications, and tangled changes. Additionally, we explore its potential applications in bug tracking, the analysis of bug-fixing practices, and the development of repository analysis tools. HaPy-Bug serves as a valuable resource for advancing research in software maintenance and security.

  9. w

    Websites susceptible to CVE-2022-31147

    • webtechsurvey.com
    csv
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2022). Websites susceptible to CVE-2022-31147 [Dataset]. https://webtechsurvey.com/cve/CVE-2022-31147
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 14, 2022
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2022-31147, compiled through global website indexing conducted by WebTechSurvey.

  10. 9

    VMware Horizon Agent: CVE-2021-45046: Log4j CVE-2021-45046 in VMware Horizon...

    • uneconomicalness.926689.com
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). VMware Horizon Agent: CVE-2021-45046: Log4j CVE-2021-45046 in VMware Horizon (on-premises) [Dataset]. http://uneconomicalness.926689.com/db/vulnerabilities/vmware-horizon-agent-cve-2021-45046/
    Explore at:
    Dataset updated
    Dec 14, 2021
    Measurement technique
    CVSSv2: AV:N/AC:H/Au:N/C:P/I:P/A:P
    Description

    It was found that the fix to address CVE-2021-44228 in Apache Log4j 2.15.0 was incomplete in certain non-default configurations. This could allows attackers with control over Thread Context Map (MDC) input data when the logging configuration uses a non-default Pattern Layout with either a Context Lookup (for example, $${ctx:loginId}) or a Thread Context Map pattern (%X, %mdc, or %MDC) to craft malicious input data using a JNDI Lookup pattern resulting in an information leak and remote code execution in some environments and local code execution in all environments. Log4j 2.16.0 (Java 8) and 2.12.2 (Java 7) fix this issue by removing support for message lookup patterns and disabling JNDI functionality by default.

  11. w

    Websites susceptible to CVE-2025-21973

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-21973 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21973
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-21973, compiled through global website indexing conducted by WebTechSurvey.

  12. w

    Websites susceptible to CVE-2025-21984

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-21984 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21984
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-21984, compiled through global website indexing conducted by WebTechSurvey.

  13. Z

    FixMe: An Incremental Lightweight Method for Vulnerability Data Collection...

    • data.niaid.nih.gov
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhandari, Guru Prasad (2024). FixMe: An Incremental Lightweight Method for Vulnerability Data Collection for Security Patch Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10955341
    Explore at:
    Dataset updated
    May 31, 2024
    Dataset provided by
    Høyskolen Kristiania
    Authors
    Bhandari, Guru Prasad
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This repository has the FixMe dataset and the source code for extracting the new dataset. is a lightweight approach for collecting code patches based on analyzing the commits of various version control systems. The practical framework is designed to generate patches across a wide array of programming languages. This open-source tool streamlines the process of gathering vulnerability records from the Common Vulnerabilities and Exposures (CVE) database through an incremental approach. By embracing an incremental methodology, we expedite the acquisition of data, ensuring the inclusion of newly identified vulnerabilities and their corresponding patch pairs. Our methodology involves extracting security issues, obtaining vulnerability-fixing commits, and retrieving relevant source code from various projects. The extracted dataset by the FixMe tool supports for the automated patch prediction, automated program repair, commit classification, vulnerability prediction and more.

  14. Project-KB based Vulnerability Method Dataset

    • data.europa.eu
    unknown
    Updated Jan 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Project-KB based Vulnerability Method Dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5855085?locale=ga
    Explore at:
    unknown(47157444)Available download formats
    Dataset updated
    Jan 15, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This archive contains the vulnerability method dataset generated from project-KB. Every entry corresponds to a method and contains metadata regarding the method and the label fixed or vulnerable. Additional details can be found in the readme file.

  15. w

    Websites susceptible to CVE-2025-30797

    • webtechsurvey.com
    csv
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites susceptible to CVE-2025-30797 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-30797
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2025-30797, compiled through global website indexing conducted by WebTechSurvey.

  16. Cloud Vulnerabilities Dataset

    • kaggle.com
    zip
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SUNNY THAKUR (2025). Cloud Vulnerabilities Dataset [Dataset]. https://www.kaggle.com/datasets/cyberprince/cloud-vulnerabilities-dataset/discussion
    Explore at:
    zip(71217 bytes)Available download formats
    Dataset updated
    Jun 19, 2025
    Authors
    SUNNY THAKUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Cloud Vulnerabilities Dataset (VUL0001-VUL1200)

    Overview The Cloud Vulnerabilities Dataset is a comprehensive collection of 1200 unique cloud security vulnerabilities, covering major cloud providers including AWS, Azure, Google Cloud Platform (GCP), Oracle Cloud, IBM Cloud, and Alibaba Cloud. This dataset is designed for cybersecurity professionals, penetration testers, machine learning engineers, and data scientists to analyze, train AI models, and enhance cloud security practices. Each entry details a specific vulnerability, including its description, category, cloud provider, vulnerable code (where applicable), proof of concept (PoC), and source references. The dataset emphasizes advanced and niche attack vectors such as misconfigurations, privilege escalations, data exposures, and denial-of-service (DoS) vulnerabilities, making it a valuable resource for red team exercises, security research, and AI-driven threat detection. Dataset Details

    Total Entries: 1200 Format: JSONL (JSON Lines)

    File Names: cloud_vulnerabilities_dataset_1-1200.jsonl

    Timestamp: Entries are timestamped as of June 19, 2025. ```python Categories: Access Control Data Exposure Privilege Escalation Data Exfiltration Denial of Service Code Injection Authentication Encryption Network Security Session Management Domain Hijacking Data Loss

    
    ```python
    Cloud Providers Covered:
    Amazon Web Services (AWS)
    Microsoft Azure
    Google Cloud Platform (GCP)
    Oracle Cloud
    IBM Cloud
    Alibaba Cloud
    

    Dataset Structure Each entry in the dataset is a JSON object with the following fields:

    id: Unique identifier for the vulnerability (e.g., VUL0001).
    description: Detailed description of the vulnerability.
    category: Type of vulnerability (e.g., Data Exposure, Privilege Escalation).
    cloud_provider: The cloud platform affected (e.g., AWS, Azure).
    vulnerable_code: Example of misconfigured code or settings (if applicable).
    poc: Proof of concept command or script to demonstrate the vulnerability.
    source: Reference to CVE or documentation link.
    timestamp: Date and time of the entry (ISO 8601 format, e.g., 2025-06-19T12:10:00Z).
    
    Example Entry
    {
     "id": "VUL1190",
     "description": "Alibaba Cloud ECS with misconfigured snapshot policy allowing data exposure.",
     "category": "Data Exposure",
     "cloud_provider": "Alibaba Cloud",
     "vulnerable_code": "{ \"SnapshotPolicy\": { \"publicAccess\": true } }",
     "poc": "aliyun ecs DescribeSnapshots --SnapshotId snapshot-id",
     "source": {
      "cve": "N/A",
      "link": "https://www.alibabacloud.com/help/doc-detail/25535.htm"
     },
     "timestamp": "2025-06-19T12:10:00Z"
    }
    

    Usage This dataset can be used for:

    Penetration Testing: Leverage PoC scripts to test cloud environments for vulnerabilities. AI/ML Training: Train machine learning models for anomaly detection, vulnerability classification, or automated remediation. Security Research: Analyze trends in cloud misconfigurations and attack vectors. Education: Teach cloud security best practices and vulnerability mitigation strategies.

    Prerequisites

    Tools: Familiarity with cloud CLI tools (e.g., AWS CLI, Azure CLI, gcloud, oci, ibmcloud, aliyun). Programming: Knowledge of Python, JSON parsing, or scripting for processing JSONL files. Access: Valid cloud credentials for testing PoCs in a controlled, authorized environment.

    Getting Started

    Download the Dataset: Obtain the JSONL files: cloud_vulnerabilities_dataset_1-1200.jsonl

    Parse the Dataset: Use a JSONL parser (e.g., Python’s json module) to read and process entries.

    import json
    
    with open('cloud_vulnerabilities_dataset_1-1200.jsonl', 'r') as file:
      for line in file:
        entry = json.loads(line.strip())
        print(entry['id'], entry['description'])
    
    
    

    Run PoCs:

    Execute PoC commands in a sandboxed environment to verify vulnerabilities (ensure proper authorization).
    Example: aws s3 ls s3://bucket for AWS S3 vulnerabilities.
    
    

    Analyze Data: Use data analysis tools (e.g., Pandas, Jupyter) to explore vulnerability patterns or train ML models.

    Security Considerations

    Ethical Use: Only test PoCs in environments where you have explicit permission. Data Sensitivity: Handle dataset entries with care, as they contain sensitive configuration examples. Mitigation: Refer to source links for official documentation on fixing vulnerabilities.

    Contributing Contributions to expand or refine the dataset are welcome. Please submit pull requests with:

    New vulnerability entries in JSONL format. Clear documentation of the vulnerability, PoC, and source. Ensure no duplicate IDs or entries.

    License This dataset is released under the MIT License. You are free to use, modify, and distribute it, provided the original attribution is maintained. Contact For questions, feedback, or contributions, please reach out via:

    Email: sunny48445@gmail.com

    Acknowledgments

    Inspir...

  17. w

    Websites susceptible to CVE-2020-15839

    • webtechsurvey.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites susceptible to CVE-2020-15839 [Dataset]. https://webtechsurvey.com/cve/CVE-2020-15839
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2020-15839, compiled through global website indexing conducted by WebTechSurvey.

  18. f

    CVE values determined by 10-fold CV and our approximation formula against...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima (2023). CVE values determined by 10-fold CV and our approximation formula against λT. [Dataset]. http://doi.org/10.1371/journal.pone.0188012.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    λℓ1 is fixed to the optimal value (2λℓ1/M = 1, coincidentally common to all cases). The number in brackets denotes the error bar to the last digits. The optimal values are bolded. The tuning constants δ and θ are set to be δ = 10−4 and θ = 10−12, respectively.

  19. w

    Websites susceptible to CVE-2021-29040

    • webtechsurvey.com
    csv
    Updated May 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2021). Websites susceptible to CVE-2021-29040 [Dataset]. https://webtechsurvey.com/cve/CVE-2021-29040
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 16, 2021
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites affected by CVE-2021-29040, compiled through global website indexing conducted by WebTechSurvey.

  20. m

    Predicting Vulnerability Inducing Function Versions Using Node Embeddings...

    • data.mendeley.com
    Updated Jan 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sefa Eren Şahin (2022). Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks [Dataset]. http://doi.org/10.17632/ymtf9znmfz.1
    Explore at:
    Dataset updated
    Jan 10, 2022
    Authors
    Sefa Eren Şahin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wireshark Vulnerability Prediction Dataset

    This dataset is constructed by a team of researchers in Istanbul Techical University Faculty of Computer and Informatics, and used in the paper entitled as "Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks". Please see the GitHub repository https://github.com/erensahin/gnn-vulnerability-prediction for more details on usage.

    This dataset consists of two main parts: * AST dumps which can be used as inputs for any Machine Learning model. (ast_input) * Wireshark file changes and bugs (file_changes_and_bugs)

    ast_input

    asp_input folder contains three files:

    • ast_input.zip: This file is a compressed version of AST dumps in Python pickle format. You should use python pickle library to unpickle and use the data.
    • node_embeddings_by_kind.pkl: Embedding vectors corresponding to AST node kinds in python pickle format.
    • token_id_vocabulary.pkl: Map of token ids and their corresponding tokens in python pickle format.

    file_changes_and_bugs

    file_changes_and_bugs folder consists of five files:

    • wireshark_file_changes.csv: list of file changes made in wireshark repository. file changes are basicly commit-file pairs.
    • wireshark_cve_bug_matching.csv: this entity maps CVE entries to bug ids in wireshark bug repository. This is scraped from https://www.wireshark.org/security/
    • additional_bugs.csv: additional security related bugs that our team manually identified by investigating security advisories and bug reports.
    • wireshark_bug_commit_matching.csv: this entity maps security bugs (vulnerabilities) to commits in wireshark source code repositry.
    • wireshark_bug_inducing_file_changes.csv: this entity maps vulnerabilities in wireshark source files in terms of in which commit a vulnerability is induced and fixed.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Moonen, Leon; Vidziunas, Linas (2024). CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes from Open-Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4476563

CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes from Open-Source Software

Explore at:
Dataset updated
Jul 28, 2024
Dataset provided by
Simula Research Laboratory
Authors
Moonen, Leon; Vidziunas, Linas
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

This release, v1.0.8, covers all published CVEs up to 23 July 2024. All open-source projects that were reported in CVE records in the NVD in this time frame and had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 12107 vulnerability fixing commits in 4249 open source projects for a total of 11873 CVEs in 272 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after changing 51342 files and 138974 functions. The collection took 48 hours with 4 workers (AMD EPYC Genoa-X 9684X).

This repository includes the SQL dump of the dataset, as well as the JSON for the CVEs and XML of the CWEs at the time of collection. The complete process has been documented in the paper "CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software", which is published in the Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). You will find a copy of the paper in the Doc folder.

Citation and Zenodo links

Please cite this work by referring to the published paper:

Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). ACM, 10 pages. https://doi.org/10.1145/3475960.3475985

@inproceedings{bhandari2021:cvefixes, title = {{CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software}}, booktitle = {{Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21)}}, author = {Bhandari, Guru and Naseer, Amara and Moonen, Leon}, year = {2021}, pages = {10}, publisher = {{ACM}}, doi = {10.1145/3475960.3475985}, copyright = {Open Access}, isbn = {978-1-4503-8680-7}, language = {en} }

The dataset has been released on Zenodo with DOI:10.5281/zenodo.4476563. The GitHub repository containing the code to automatically collect the dataset can be found at https://github.com/secureIT-project/CVEfixes, released with DOI:10.5281/zenodo.5111494.

Search
Clear search
Close search
Google apps
Main menu