Code authorship attribution is the process of identifying the author of a given code. With increasing numbers of malware and advanced mutation techniques, the authors of malware are creating a large number of malware variants. To better deal with this problem, methods for examining the authorship of malicious code are necessary. Code authorship attribution techniques can thus be utilized to identify and categorize the authors of malware. This information can help predict the types of tools and techniques that the author of a specific malware uses, as well as the manner in which the malware spreads and evolves. In this article, we present the first comprehensive review of research on code authorship attribution. The article summarizes various methods of authorship attribution and highlights challenges in the field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.