Vulnerabilities in Open Source Software (OSS) are the major culprits of cyber-attacks and security breaches today. To avoid repetitive development and speed up release cycle, software teams nowadays are increasingly relying on OSS. However, many OSS users are unware of the vulnerable components they are using. The de-facto security advisory standard, National Vulnerability Database (NVD) is known to suffer from poor coverage and inconsistency. Sometimes it will take weeks or even months for a Common Vulnerabilities and Exposures (CVE) to be determined and finally patched. Thus, to mitigate against cyber-attacks, it is important to understand both known CVEs and unknown vulnerabilities. In this thesis, we first conducted a large-scale crawling of Git commits for some popular open source repositories like Linux. Second, because there is no prior dataset for security-relevant Git commits, we developed a web-based triage system for security researchers to perform manual labelling of the commits. Finally, after the commits are cleaned and labelled, a deep neural network is implemented to automatically identify vulnerability-fixing commits (VFC) based on the commit messages. The approach has achieved significant better precision than state-of-the-art while improving the recall rate by 16.8%. In the end, we present a thorough quantitative and qualitative analysis of the results and discuss the lessons learned and room for future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.