Abstract:We present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack was almost 100% successful. Next, we provided counter-measures to this attack: a more elaborated features selection and the use of a threshold allowed us to stop up to 99.99% of this attack. Finally, using adversarial learning techniques, we were able to prevent gradient-descent attacks by iteratively feeding the SVM with malicious forged PDF files. We found that after 3 iterations, every gradient-descent forged PDF file were detected, completely preventing the attack.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.