Reverse engineering, software plagiarism detection, and malware analysis have always been important issues in software and security fields. For a binary code, the function-call graph (FCG) reflects its capability, structure, and intrinsic relations, which motivates us to study FCG matching and its applications in those problems systematically. In this work, we propose an FCG matching algorithm based on Hungarian algorithm that solves the maximum weight matching problem in polynomial time and makes matching between graphs of large scale possible. Also, optimizations including node pairs pruning and forward matching are proposed to improve the efficiency and accuracy of FCG matching algorithm. Finally, a series of experiments are conducted to show that FCG matching is an effective method and has huge application potentiality in software and security analysis.
KEYWORDSfunction-call graph, graph matching, Hungarian algorithm, reverse engineering, software security
INTRODUCTIONGraph matching aims at finding a bijective mapping that matches nodes from different graphs. Graph matching has been widely used in various fields such as pattern recognition, computer vision, and bioinformatics. For the binary code of a program, function-call graph (FCG) is a graph that depicts the invocation of functions. It reflects the structure and some internal features of the program. FCG is less susceptible to instruction-level obfuscation employed in malicious codes to evade detection. 1 Because common obfuscation of binary codes such as instruction reordering, equivalent instruction sequence substitution and branch inversion will not cause significant changes to FCG. For this reason, FCG matching of binary codes is a promising method for some important issues in software and security fields, like reverse engineering, software plagiarism detection, malware homology analysis, and malware classification and clustering, which are illustrated in Figure 1, respectively. a) Reverse engineering: Supposing we have analyzed program P by reverse engineering, the reverse analysis of its successor or close-related program P ′ could be easier by leveraging of the result of program P. Graph matching maps the functions in program P ′ to the similar reversed-functions in program P automatically, and therefore, it reduces the overall workloads of reverse engineering significantly. b) Software plagiarism detection: By the use of FCG matching from protected software to suspicious software, pairs of functions with high similarity could be revealed as the plagiarized parts.c) Malware homology analysis: Discovering the isomorphic subgraphs through FCG matching helps determine the homologous malware samples that are developed by same author or share same code fragments. d) Malware classification and clustering: Distance between malware samples can be well defined according to the results of FCG matching. Such distance can be used by typical classification or clustering algorithms to classify or cluster a set of unknown malware samples.So far, most of fast alg...