Abstract-when manufacturers release patches, they are usually released as binary executable programs. Vendors generally do not disclose the exact location of the vulnerabilities, even they may conceal some of the vulnerabilities, which is not conducive to study the in-depth situation of security for the need of consumers. In this paper we introduce a vulnerability discover method using machine learning based on patch information -SemHunt. Firstly, we use it to compare two versions of the same program to get the potential vulnerability-patched function pairs using binary comparison technology. Then, we combine it with vulnerability and patch knowledge database to classify these function pairs and identify the possible vulnerable functions and the vulnerability types. We completed a prototype of SemHunt, which can effectively identify vulnerable function types and the location of corresponding vulnerabilities, which are not revealed in the released patch files. Finally, we test some programs containing real-world CWE vulnerabilities, and one of the experimental results about CWE843 shows that the results returned from only searching source program are about twice as much as the results from SemHunt. We can see that using SemHunt can significantly reduce false positive rate of discovering vulnerabilities compared with analyzing source files alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.