Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.