Identifying malicious software provides great benefit for distributed and networked systems. Traditional real-time malware detection has relied on using signatures and string matching. However, string signatures ineffectively deal with polymorphic malware variants. Control flow has been proposed as an alternative signature that can be identified across such variants. This paper proposes a novel classification system to detect polymorphic variants using flowgraphs. We propose using an existing heuristic flowgraph matching algorithm to estimate graph isomorphisms. Moreover, we can determine similarity between programs by identifying the underlying isomorphic flowgraphs. A high similarity between the query program and known malware identifies a variant. To demonstrate the effectiveness and efficiency of our flowgraph based classification, we compare it to alternate algorithms, and evaluate the system using real and synthetic malware. The evaluation shows our system accurately detects real malware, performs efficiently, and is scalable. These performance characteristics enable real-time use on an intermediary node such as an Email gateway, or on the endhost.
This thesis introduces the major applications related to software similarity and classification and proposes novel contributions to the theory and practice of malware detection and clone detection. The topic of software similarity and classification covers the areas of detecting software variants, clones, derivatives, and classes of software. The literature of those individual areas can be combined into a cohesive topic that we examine in a unified manner. We demonstrate that considering these applied problems as a software similarity and classification problem enables techniques to be shared between areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.