Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering 2021
DOI: 10.1145/3475960.3475985
|View full text |Cite
|
Sign up to set email alerts
|

CVEfixes: automated collection of vulnerabilities and their fixes from open-source software

Abstract: Data-driven research on the automated discovery and repair of security vulnerabilities in source code requires comprehensive datasets of real-life vulnerable code and their fixes. To assist in such research, we propose a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures (CVE) records in the public National Vulnerability Database (NVD). We implement our approach in a fully automated dataset collection tool and share an initial release of t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 93 publications
(18 citation statements)
references
References 41 publications
0
18
0
Order By: Relevance
“…We create a bug fixing dataset for the purpose of the experiment (see subsubsection 4.2.1). For vulnerabilities, we use two existing vulnerability fix datasets from the literature, called Big-Vul [15] and CVEfixes [16] that both consist of confirmed vulnerabilities with CVE IDs.…”
Section: Datasetsmentioning
confidence: 99%
See 3 more Smart Citations
“…We create a bug fixing dataset for the purpose of the experiment (see subsubsection 4.2.1). For vulnerabilities, we use two existing vulnerability fix datasets from the literature, called Big-Vul [15] and CVEfixes [16] that both consist of confirmed vulnerabilities with CVE IDs.…”
Section: Datasetsmentioning
confidence: 99%
“…We use two existing datasets called Big-Vul [15] and CVEfixes [16] for tuning the model trained on the bug fixing examples. The Big-Vul dataset has been created by crawling CVE databases and extracting vulnerability related information such as CWE ID and CVE ID.…”
Section: Vulnerability Fix Corpusmentioning
confidence: 99%
See 2 more Smart Citations
“…Recent work have also studied and used the CWE and CVE systems. For example, the work in (Bhandari et al, 2021) collected CVE records with their associated CWEs and code commits. The collected information was then analysed to produce insightful metadata such as concerned programming language and code-related metrics.…”
Section: Related Workmentioning
confidence: 99%