In this paper, we analyze several metamorphic virus generators. We define a similarity index and use it to precisely quantify the degree of metamorphism that each generator produces. Then we present a detector based on hidden Markov models and we consider a simpler detection method based on our similarity index. Both of these techniques detect all of the metamorphic viruses in our test set with extremely high accuracy. In addition, we show that popular commercial virus scanners do not detect the highly metamorphic virus variants in our test set.
tutorial was originally published online in 2004. Minor corrections and additions have been made over time, with new (and improved!) exercises added. This current version is suspiciously similar to Chapter 2 of my book, Introduction to Machine Learning with Applications in Information Security [5].
Hunting for Undetectable Metamorphic Viruses by Da LinCommercial anti-virus scanners are generally signature based, that is, they scan for known patterns to determine whether a file is infected by a virus or not. To evade signature-based detection, virus writers have adopted code obfuscation techniques to create highly metamorphic computer viruses. Since metamorphic viruses change their appearance from generation to generation, signature-based scanners cannot detect all instances of such viruses.To combat metamorphic viruses, detection tools based on statistical analysis have been studied.A tool based on hidden Markov models (HMMs) was previously developed and the results are encouraging-it has been shown that metamorphic viruses created by a well-designed metamorphic engine can be detected using an HMM.In this project, we explore whether there are any exploitable weaknesses in this HMM-based detection approach. We create a highly metamorphic virus generating tool designed specifically to evade HMM-based detection. We then test our engine, showing that we can generate viral copies that cannot be detected using previously-developed HMM-based detection techniques.Finally, we consider possible defenses against our approach.-3 -
ACKNOWLEDGEMENTS
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.