Binary function similarity analysis evaluates the similarity of functions at the binary level to aid program analysis, which is popular in many fields, such as vulnerability detection, binary clone detection, and malware detection. Graph-based methods have relatively good performance in practice, but currently, they cannot capture similarity in the aspect of the graph position distribution and lose information in graph processing, which leads to low accuracy. This paper presents PDM, a graph-based method to increase the accuracy of binary function similarity detection, by considering position distribution information. First, an enhanced Attributed Control Flow Graph (ACFG+) of a function is constructed based on a control flow graph, assisted by the instruction embedding technique and data flow analysis. Then, ACFG+ is fed to a graph embedding model using the CapsGNN and DiffPool mechanisms, to enrich information in graph processing by considering the position distribution. The model outputs the corresponding embedding vector, and we can calculate the similarity between different function embeddings using the cosine distance. Similarity detection is completed in the Siamese network. Experiments show that compared with VulSeeker and PalmTree+VulSeeker, PDM can stably obtain three-times and two-times higher accuracy, respectively, in binary function similarity detection and can detect up to six-times more results in vulnerability detection. When comparing with some state-of-the-art tools, PDM has comparable Top-5, Top-10, and Top-20 ranking results with respect to BinDiff, Diaphora, and Kam1n0 and significant advantages in the Top-50, Top-100, and Top-200 detection results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.