CVEfixes: automated collection of vulnerabilities and their fixes from open-source software

Bhandari, Guru Prasad; Naseer, Amara; Moonen, Leon

doi:10.1145/3475960.3475985

Cited by 93 publications

(18 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We create a bug fixing dataset for the purpose of the experiment (see subsubsection 4.2.1). For vulnerabilities, we use two existing vulnerability fix datasets from the literature, called Big-Vul [15] and CVEfixes [16] that both consist of confirmed vulnerabilities with CVE IDs.…”

Section: Datasetsmentioning

confidence: 99%

“…We use two existing datasets called Big-Vul [15] and CVEfixes [16] for tuning the model trained on the bug fixing examples. The Big-Vul dataset has been created by crawling CVE databases and extracting vulnerability related information such as CWE ID and CVE ID.…”

Section: Vulnerability Fix Corpusmentioning

confidence: 99%

“…CVEfixes [16] [70]. They used the dataset to train a bi-directional graph neural network for a vulnerability detection system.…”

Section: Vulnerability Datasetsmentioning

confidence: 99%

“…We use this data to first train VRepair on the task of bug fixing. Next, we use two datasets of vulnerability fixes from previous research, called Big-Vul [15] and CVEfixes [16]. We tune VRepair on the vulnerability fixing task based on both datasets.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Chen

Kommrusch

Monperrus

2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related and that the knowledge learned from bug fixes can be transferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, we propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trained on a large bug fix corpus and is then tuned on a vulnerability fix dataset, which is an order of magnitude smaller. In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learning improves the ability to repair vulnerable C functions. We also show that the transfer learning model performs better than a model trained with a denoising task and fine-tuned on the vulnerability fixing task. To sum up, this paper shows that transfer learning works well for repairing security vulnerabilities in C compared to learning on a small dataset.

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Vulnerability Fix Corpusmentioning

confidence: 99%

“…CVEfixes [16] [70]. They used the dataset to train a bi-directional graph neural network for a vulnerability detection system.…”

Section: Vulnerability Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Chen

Kommrusch

Monperrus

2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Recent work have also studied and used the CWE and CVE systems. For example, the work in (Bhandari et al, 2021) collected CVE records with their associated CWEs and code commits. The collected information was then analysed to produce insightful metadata such as concerned programming language and code-related metrics.…”

Section: Related Workmentioning

confidence: 99%

Common Privacy Weaknesses and Vulnerabilities in Software Applications

Sangaroonsilp¹,

Dam²,

Ghose³

2021

Preprint

View full text Add to dashboard Cite

In this digital era, our privacy is under constant threat as our personal data and traceable online/offline activities are frequently collected, processed and transferred by many software applications. Privacy attacks are often formed by exploiting vulnerabilities found in those software applications. The Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) systems are currently the main sources that software engineers rely on for understanding and preventing publicly disclosed software vulnerabilities. However, our study on all 922 weaknesses in the CWE and 156,537 vulnerabilities registered in the CVE to date has found a very small coverage of privacy-related vulnerabilities in both systems, only 4.45% in CWE and 0.1% in CVE. These also cover only a small number of areas of privacy threats that have been raised in existing privacy software engineering research, privacy regulations and frameworks, and industry sources. The actionable insights generated from our study led to the introduction of 11 new common privacy weaknesses to supplement the CWE system, making it become a source for both security and privacy vulnerabilities.

show abstract