Many malware campaigns use Microsoft (MS) Office documents as droppers to download and execute their malicious payload. Such campaigns often use these documents because MS Office is installed in billions of devices and that these files allow the execution of arbitrary VBA code. Recent versions of MS Office prevent the automatic execution of VBA macros, so malware authors try to convince users into enabling the content via images that, e.g. forge system or technical errors.In this work, we leverage these visual elements to construct lightweight malware signatures that can be applied with minimal effort. We test and validate our approach using an extensive database of malware samples and identify correlations between different campaigns that illustrate that some campaigns are either using the same tools or that there is some collaboration between them.
Many malware campaigns use Microsoft (MS) Office documents as droppers to download and execute their malicious payload. Such campaigns often use these documents because MS Office is installed on billions of devices and that these files allow the execution of arbitrary VBA code. Recent versions of MS Office prevent the automatic execution of VBA macros, so malware authors try to convince users into enabling the content via images that, e.g. forge system or technical errors.
In this article, we propose a mechanism to extract and analyse the different components of the files, including these visual elements, and construct lightweight signatures based on them. These visual elements are used as input for a text extraction pipeline which, in combination with the signatures, is able to capture the intent of MS Office files and the campaign they belong to. We test and validate our approach using an extensive database of malware samples, obtaining an accuracy above 99% in the task of distinguishing between benign and malicious files. Furthermore, our signature-based scheme allowed us to identify correlations between different campaigns, illustrating that some campaigns are either using the same tools or collaborating between them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.