17 datasets found

Z
MoreFixes: Largest CVE dataset with fixes
data.niaid.nih.gov
zenodo.org
Updated Oct 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rietveld, Kristian F. D. (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
Explore at:
Dataset updated
Oct 23, 2024
Dataset provided by
GADYATSKAYA, Olga
Rietveld, Kristian F. D.
Rahim Nouri, Sajad
Akhoundali, Jafar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

This product uses the NVD API but is not endorsed or certified by the NVD.

This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

Please use this for citation:

title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery}, author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga}, booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering}, pages={42--51}, year={2024} }
Z
CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes...
data.niaid.nih.gov
zenodo.org
Updated Jul 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vidziunas, Linas (2024). CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes from Open-Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4476563
Explore at:
Dataset updated
Jul 28, 2024
Dataset provided by
Moonen, Leon
Vidziunas, Linas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

This release, v1.0.8, covers all published CVEs up to 23 July 2024. All open-source projects that were reported in CVE records in the NVD in this time frame and had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 12107 vulnerability fixing commits in 4249 open source projects for a total of 11873 CVEs in 272 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after changing 51342 files and 138974 functions. The collection took 48 hours with 4 workers (AMD EPYC Genoa-X 9684X).

This repository includes the SQL dump of the dataset, as well as the JSON for the CVEs and XML of the CWEs at the time of collection. The complete process has been documented in the paper "CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software", which is published in the Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). You will find a copy of the paper in the Doc folder.

Citation and Zenodo links

Please cite this work by referring to the published paper:

Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). ACM, 10 pages. https://doi.org/10.1145/3475960.3475985

@inproceedings{bhandari2021:cvefixes, title = {{CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software}}, booktitle = {{Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21)}}, author = {Bhandari, Guru and Naseer, Amara and Moonen, Leon}, year = {2021}, pages = {10}, publisher = {{ACM}}, doi = {10.1145/3475960.3475985}, copyright = {Open Access}, isbn = {978-1-4503-8680-7}, language = {en} }

The dataset has been released on Zenodo with DOI:10.5281/zenodo.4476563. The GitHub repository containing the code to automatically collect the dataset can be found at https://github.com/secureIT-project/CVEfixes, released with DOI:10.5281/zenodo.5111494.
MegaVul: A C/C++/Java Vulnerability Dataset
kaggle.com
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Damie (2025). MegaVul: A C/C++/Java Vulnerability Dataset [Dataset]. https://www.kaggle.com/datasets/marcdamie/megavul-a-cc-java-vulnerability-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marc Damie
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
With over 17,000 identified vulnerable functions and 320,000 non-vulnerable functions extracted from 9,000 vulnerability fix commits, MegaVul provides multi-dimensional data to help you train state-of-the-art sequence-based vulnerability detectors.

DISCLAIMER

I am not the author of this dataset. I simply uploaded the dataset on Kaggle to simplify its use (instead of the OneDrive originally provided by the authors). To contact the authors, we refer to their GitHub (https://github.com/Icyrockton/MegaVul) that also provides the code necessary to crawl the data.

Getting Started

We offer three versions of the pre-crawled MegaVul, as well as providing Joern graphs extracted from all functions.

The differences between the three versions are as follows:

cve_with_graph_abstract_commit.json: Raw dataset with complete hierarchical structure. It includes information such as CVE, Commit, :wqFile, Functions, etc.

megavul.json is a version of cve_with_graph_abstract_commit.json after flattened, for easier use. Keep all fields but losing the hierarchical structure.

megavul_simple.json is a simple version of megavul.json, designed to provide a more concise representation of the dataset. It retains essential fields such as Functions and CVE IDs while omitting detail information like function parameter lists and commit message.

⏩ Simple use case

More code examples can be found there: https://github.com/Icyrockton/MegaVul/tree/main/examples

The following code reads megavul_simple.json

import json from pathlib import Path graph_dir = Path('../megavul/storage/result/c_cpp/graph') with Path("../megavul/storage/result/c_cpp/megavul_simple.json").open(mode='r') as f: megavul = json.load(f) item = megavul[9] cve_id = item['cve_id'] # CVE-2022-24786 cvss_vector = item['cvss_vector'] # AV:N/AC:L/Au:N/C:P/I:P/A:P is_vul = item['is_vul'] # True if is_vul: func_before = item['func_before'] # vulnerable function func_after = item['func'] # after vul function fixed(i.e., clean function) abstract_func_after = item['abstract_func'] diff_line_info = item['diff_line_info'] # {'deleted_lines': ['pjmedia_rtcp_comm .... ] , 'added_lines': [ .... ] } git_url = item['git_url'] # https://github.com/pjsip/pjproject/commit/11559e49e65bdf00922ad5ae28913ec6a198d508 if item['func_graph_path_before'] is not None: # graphs of some functions cannot be exported successfully graph_file_path = graph_dir / item['func_graph_path_before'] graph_file = json.load(graph_file_path.open(mode='r')) nodes, edges = graph_file['nodes'] , graph_file['edges'] print(nodes) # [{'version': '0.1', 'language': 'NEWC', '_label': 'META_DATA', 'overlays': .... print(edges) # [{'innode': 196, 'outnode': 2, 'etype': 'AST', 'variable': None}, ...]
ASE2021 vulnerability fix dataset
zenodo.org
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiayuan Zhou; Michael Pacheco; Zhiyuan Wan; Xin Xia; David Lo; Yuan Wang; Ahmed E. Hassan; Jiayuan Zhou; Michael Pacheco; Zhiyuan Wan; Xin Xia; David Lo; Yuan Wang; Ahmed E. Hassan (2023). ASE2021 vulnerability fix dataset [Dataset]. http://doi.org/10.5281/zenodo.5513051
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5513051
Dataset updated
Mar 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiayuan Zhou; Michael Pacheco; Zhiyuan Wan; Xin Xia; David Lo; Yuan Wang; Ahmed E. Hassan; Jiayuan Zhou; Michael Pacheco; Zhiyuan Wan; Xin Xia; David Lo; Yuan Wang; Ahmed E. Hassan
Description
The dataset of "Finding A Needle in a Haystack: Automated Mining of Silent Vulnerability Fixes", which was accepted in the 36th IEEE/ACM Automated Software Engineering (ASE) Conference.

Followings are the descriptions of columns:

commit_id: The commit ID/hash.

repo: The Github Author and repository (e.g., "apache/hive").

filename: The name of the file changed in the commit.

partition: Which dataset the commit information belongs to (i.e., "train", "val", or "test").

PL: Programming Language (PL) (i.e., "java" or "py").

label: Label of the commit, 0 for non-vulnerability fixing commit and 1 for vulnerability fixing commit.

diff: The entire code change information of the file in this commit.

committer_date: The date of the commit (e.g., 2015-03-02 13:48:25+13:00)

msg: The commit message (NA if empty).

MOD_DIFF: The code change of the file in this commit after preprocessing: filtering out lines that are not added lines or removed lines, and removing refactoring information and comments.

BPE_MOD_DIFF: BPE processing applied to MOD_DIFF information (using codeprep Python package).

ADD_DIFF: The added lines from the MOD_DIFF information (indicated as a line starting with '+' character).

REM_DIFF: The removed lines from the MOD_DIFF information (indicated as a line starting with '-' character).

LOC_ADD: Total lines of code added in this file change.

LOC_REM: Total lines of code removed in this file change.

LOC_MOD: Total lines of code modified in this file change (LOC_ADD + LOC_REM).

commit_repo: The commit ID and repository concatenated.

cve_list: A list of CVEs which the commit fixes (e.g., CVE-2015-5348, CVE-2016-8902).
h
vulrepair
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YAM, vulrepair [Dataset]. https://huggingface.co/datasets/nus-yam/vulrepair
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
YAM
Description
A clean union of BigVul and CVE-Fixes.
Z
FixMe: An Incremental Lightweight Method for Vulnerability Data Collection...
data.niaid.nih.gov
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhandari, Guru Prasad (2024). FixMe: An Incremental Lightweight Method for Vulnerability Data Collection for Security Patch Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10955341
Explore at:
Dataset updated
May 31, 2024
Dataset authored and provided by
Bhandari, Guru Prasad
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This repository has the FixMe dataset and the source code for extracting the new dataset. is a lightweight approach for collecting code patches based on analyzing the commits of various version control systems. The practical framework is designed to generate patches across a wide array of programming languages. This open-source tool streamlines the process of gathering vulnerability records from the Common Vulnerabilities and Exposures (CVE) database through an incremental approach. By embracing an incremental methodology, we expedite the acquisition of data, ensuring the inclusion of newly identified vulnerabilities and their corresponding patch pairs. Our methodology involves extracting security issues, obtaining vulnerability-fixing commits, and retrieving relevant source code from various projects. The extracted dataset by the FixMe tool supports for the automated patch prediction, automated program repair, commit classification, vulnerability prediction and more.
w
Websites susceptible to CVE-2025-21973
webtechsurvey.com
csv
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites susceptible to CVE-2025-21973 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21973
Explore at:
csvAvailable download formats
Dataset updated
Apr 1, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2025-21973, compiled through global website indexing conducted by WebTechSurvey.
f
HaPy-Bug – Human Annotated Python Bug Resolution Dataset
figshare.com
txt
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piotr Przymus; Mikołaj Fejzer; Jakub Narębski; Radosław Woźniak; Łukasz Halada; Aleksander Kazecki; Mykhailo Molchanov; Krzysztof Stencel (2025). HaPy-Bug – Human Annotated Python Bug Resolution Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24448663.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24448663.v1
Dataset updated
Feb 4, 2025
Dataset provided by
figshare
Authors
Piotr Przymus; Mikołaj Fejzer; Jakub Narębski; Radosław Woźniak; Łukasz Halada; Aleksander Kazecki; Mykhailo Molchanov; Krzysztof Stencel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present HaPy-Bug, a curated dataset of 793 Python source code commits associated with bug fixes, with each line of code annotated by three domain experts. The annotations offer insights into the purpose of modified files, changes at the line level, and reviewers’ confidence levels. We analyze HaPy-Bug to examine the distribution of file purposes, types of modifications, and tangled changes. Additionally, we explore its potential applications in bug tracking, the analysis of bug-fixing practices, and the development of repository analysis tools. HaPy-Bug serves as a valuable resource for advancing research in software maintenance and security.
h
vulnerability-cwe-patch
huggingface.co
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Computer Incident Response Center Luxembourg (2025). vulnerability-cwe-patch [Dataset]. https://huggingface.co/datasets/CIRCL/vulnerability-cwe-patch
Explore at:
Dataset updated
Sep 6, 2025
Dataset authored and provided by
Computer Incident Response Center Luxembourg
Description
This dataset, CIRCL/vulnerability-cwe-patch, provides structured real-world vulnerabilities enriched with CWE identifiers and actual patches from platforms like GitHub and GitLab. It was built to support the development of tools for vulnerability classification, triage, and automated repair. Each entry includes metadata such as CVE/GHSA ID, a description, CWE categorization, and links to verified patch commits with associated diff content and commit messages. The dataset is automatically… See the full description on the dataset page: https://huggingface.co/datasets/CIRCL/vulnerability-cwe-patch.
w
Websites susceptible to CVE-2025-21948
webtechsurvey.com
csv
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites susceptible to CVE-2025-21948 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21948
Explore at:
csvAvailable download formats
Dataset updated
Apr 1, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2025-21948, compiled through global website indexing conducted by WebTechSurvey.
w
Websites susceptible to CVE-2025-30797
webtechsurvey.com
csv
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites susceptible to CVE-2025-30797 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-30797
Explore at:
csvAvailable download formats
Dataset updated
Apr 1, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2025-30797, compiled through global website indexing conducted by WebTechSurvey.
w
Websites susceptible to CVE-2025-21984
webtechsurvey.com
csv
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites susceptible to CVE-2025-21984 [Dataset]. https://webtechsurvey.com/cve/CVE-2025-21984
Explore at:
csvAvailable download formats
Dataset updated
Apr 1, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2025-21984, compiled through global website indexing conducted by WebTechSurvey.
f
CVE values determined by 10-fold CV and our approximation formula against...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima (2023). CVE values determined by 10-fold CV and our approximation formula against λT. [Dataset]. http://doi.org/10.1371/journal.pone.0188012.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0188012.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Tomoyuki Obuchi; Shiro Ikeda; Kazunori Akiyama; Yoshiyuki Kabashima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
λℓ1 is fixed to the optimal value (2λℓ1/M = 1, coincidentally common to all cases). The number in brackets denotes the error bar to the last digits. The optimal values are bolded. The tuning constants δ and θ are set to be δ = 10−4 and θ = 10−12, respectively.
w
Websites susceptible to CVE-2021-29053
webtechsurvey.com
csv
Updated May 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2021). Websites susceptible to CVE-2021-29053 [Dataset]. https://webtechsurvey.com/cve/CVE-2021-29053
Explore at:
csvAvailable download formats
Dataset updated
May 17, 2021
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2021-29053, compiled through global website indexing conducted by WebTechSurvey.
m
Predicting Vulnerability Inducing Function Versions Using Node Embeddings...
data.mendeley.com
Updated Jan 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sefa Eren Şahin (2022). Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks - Wireshark [Dataset]. http://doi.org/10.17632/ymtf9znmfz.2
Explore at:
Unique identifier
https://doi.org/10.17632/ymtf9znmfz.2
Dataset updated
Jan 19, 2022
Authors
Sefa Eren Şahin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wireshark Vulnerability Prediction Dataset

This dataset is constructed by a team of researchers in Istanbul Techical University Faculty of Computer and Informatics, and used in the paper entitled as "Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks". Please see the GitHub repository https://github.com/erensahin/gnn-vulnerability-prediction for more details on usage.

This dataset consists of two main parts: * AST dumps which can be used as inputs for any Machine Learning model. (ast_input) * Wireshark file changes and bugs (file_changes_and_bugs)

ast_input

asp_input folder contains three files:

ast_input.zip: This file is a compressed version of AST dumps in Python pickle format. You should use python pickle library to unpickle and use the data.

node_embeddings_by_kind.pkl: Embedding vectors corresponding to AST node kinds in python pickle format.

token_id_vocabulary.pkl: Map of token ids and their corresponding tokens in python pickle format.

file_changes_and_bugs

file_changes_and_bugs folder consists of five files:

wireshark_file_changes.csv: list of file changes made in wireshark repository. file changes are basicly commit-file pairs.

wireshark_cve_bug_matching.csv: this entity maps CVE entries to bug ids in wireshark bug repository. This is scraped from https://www.wireshark.org/security/

additional_bugs.csv: additional security related bugs that our team manually identified by investigating security advisories and bug reports.

wireshark_bug_commit_matching.csv: this entity maps security bugs (vulnerabilities) to commits in wireshark source code repositry.

wireshark_bug_inducing_file_changes.csv: this entity maps vulnerabilities in wireshark source files in terms of in which commit a vulnerability is induced and fixed.
w
Websites susceptible to CVE-2020-15839
webtechsurvey.com
csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey, Websites susceptible to CVE-2020-15839 [Dataset]. https://webtechsurvey.com/cve/CVE-2020-15839
Explore at:
csvAvailable download formats
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2020-15839, compiled through global website indexing conducted by WebTechSurvey.
w
Websites susceptible to CVE-2022-45320
webtechsurvey.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey, Websites susceptible to CVE-2022-45320 [Dataset]. https://webtechsurvey.com/cve/CVE-2022-45320
Explore at:
csvAvailable download formats
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites affected by CVE-2022-45320, compiled through global website indexing conducted by WebTechSurvey.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rietveld, Kristian F. D. (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119

MoreFixes: Largest CVE dataset with fixes

Explore at:

Dataset updated

Oct 23, 2024

Dataset provided by

GADYATSKAYA, Olga
Rietveld, Kristian F. D.
Rahim Nouri, Sajad
Akhoundali, Jafar

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

This product uses the NVD API but is not endorsed or certified by the NVD.

This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

Please use this for citation:

 title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
 author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
 booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
 pages={42--51},
 year={2024}
}

Clear search

Close search

Google apps

Main menu

MoreFixes: Largest CVE dataset with fixes

CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes...

MegaVul: A C/C++/Java Vulnerability Dataset

DISCLAIMER

Getting Started

⏩ Simple use case

ASE2021 vulnerability fix dataset

vulrepair

FixMe: An Incremental Lightweight Method for Vulnerability Data Collection...

Websites susceptible to CVE-2025-21973

HaPy-Bug – Human Annotated Python Bug Resolution Dataset

vulnerability-cwe-patch

Websites susceptible to CVE-2025-21948

Websites susceptible to CVE-2025-30797

Websites susceptible to CVE-2025-21984

CVE values determined by 10-fold CV and our approximation formula against...

Websites susceptible to CVE-2021-29053

Predicting Vulnerability Inducing Function Versions Using Node Embeddings...

Wireshark Vulnerability Prediction Dataset

ast_input

file_changes_and_bugs

Websites susceptible to CVE-2020-15839

Websites susceptible to CVE-2022-45320

MoreFixes: Largest CVE dataset with fixes