a b s t r a c tThe capability of efficiently recognizing reused functions for binary code is critical to many digital forensics tasks, especially considering the fact that many modern malware typically contain a significant amount of functions borrowed from open source software packages. Such a capability will not only improve the efficiency of reverse engineering, but also reduce the odds of common libraries leading to false correlations between unrelated code bases. In this paper, we propose SIGMA, a technique for identifying reused functions in binary code by matching traces of a novel representation of binary code, namely, the Semantic Integrated Graph (SIG). The SIG s enhance and merge several existing concepts from classic program analysis, including control flow graph, register flow graph, and function call graph into a joint data structure. Such a comprehensive representation allows us to capture different semantic descriptors of common functionalities in a unified manner as graph traces, which can be extracted from binaries and matched to identify reused functions, actions, or open source software packages. Experimental results show that our approach yields promising results. Furthermore, we demonstrate the effectiveness of our approach through a case study using two malware known to share common functionalities, namely, Zeus and Citadel.
There are many occasions in which the security community is interested to
discover the authorship of malware binaries, either for digital forensics
analysis of malware corpora or for thwarting live threats of malware invasion.
Such a discovery of authorship might be possible due to stylistic features
inherent to software codes written by human programmers. Existing studies of
authorship attribution of general purpose software mainly focus on source code,
which is typically based on the style of programs and environment. However,
those features critically depend on the availability of the program source
code, which is usually not the case when dealing with malware binaries. Such
program binaries often do not retain many semantic or stylistic features due to
the compilation process. Therefore, authorship attribution in the domain of
malware binaries based on features and styles that will survive the compilation
process is challenging. This paper provides the state of the art in this
literature. Further, we analyze the features involved in those techniques. By
using a case study, we identify features that can survive the compilation
process. Finally, we analyze existing works on binary authorship attribution
and study their applicability to real malware binaries.Comment: FPS 201
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.