PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

Elingiusti, Michele; Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto

doi:10.1007/978-3-319-73951-9_9

Cited by 23 publications

(16 citation statements)

References 16 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accordingly, we described all machine learning-based solutions for PDF malware detection that have been proposed in the last decade. Notably, this part differentiates from previously proposed surveys, such as the ones by Nissim et al [72] and Elingiusti et al [32], which did not focus on those system components that are crucial to understanding adversarial attacks. For example, our work provided a deep insight into the pre-processing of PDF files, which has been exploited by many adversarial attacks.…”

Section: Discussionmentioning

confidence: 88%

“…[55,59,78,84,93,94,100,101]). For a more detailed description of such systems, we refer the reader to more general purpose surveys [32,72]. Learning-based PDF malware detection The primary goal of machine-learning detectors for malicious documents is discriminating between benign and malicious files.…”

Section: Machine Learning For Pdf Malware Detectionmentioning

confidence: 99%

“…Finally, it is easy to modify its structure, by for example injecting benign or malicious material in various portions of the file. Such characteristic makes PDF files particularly prone to be used in adversarial environments, as the effort that attackers have to carry out to create attack samples is significantly low.While previous surveys in the field of PDF malware analysis focused on describing the properties of detection systems [32,72], our work explores the topic under the perspective of adversarial machine learning. The idea is showing how adversarial attacks have been carried out against PDF malware detectors by exploiting the vulnerabilities of their essential components, and highlighting how the arms race between attackers and defenders has evolved in this scenario over the last decade.…”

mentioning

confidence: 99%

“…While previous surveys in the field of PDF malware analysis focused on describing the properties of detection systems [32,72], our work explores the topic under the perspective of adversarial machine learning. The idea is showing how adversarial attacks have been carried out against PDF malware detectors by exploiting the vulnerabilities of their essential components, and highlighting how the arms race between attackers and defenders has evolved in this scenario over the last decade.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Towards Adversarial Malware Detection

Maiorca¹,

Biggio²,

Giacinto³

2019

ACM Comput. Surv.

View full text Add to dashboard Cite

Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings.:2 D. Maiorca et al.formats to conceal malicious code, making their detection significantly harder. Second, infection vectors can be effectively used in social engineering campaigns, as victims are more prone to receive and open documents or multimedia content. Finally, although vulnerabilities of third-party applications are often publicly disclosed, they are not promptly patched. The absence of proper security updates makes thus the lifespan of attacks perpetrated with infection vectors much longer.Machine learning-based technologies have been increasingly used both in academic and industrial environments (see e.g., [48]) to detect malware embedded in infection vectors like malicious PDF files. Research work has demonstrated that learning-based systems could be effective at detecting obfuscated attacks that are typically able to evade simple heuristics [23,65,82,95], but the problem is still far from being solved. Despite the significant increment of detected attacks, researchers started questioning the reliability of learning algorithms against adversarial attacks carefully-crafted against them [8-10, 17, 18]. Such attacks became widely popular when researchers showed that it was possible to evade deep learning algorithms for computer vision with adversarial examples, i.e., minimally-perturbed images that mislead classification [40,88]. The same attack principles have also been employed to craft adversarial malware samples, as first shown in [9], and subsequently explored in [29,42,52,96,99]. Such attacks can typically perform few, fine-grained changes on correctly detected, malicious samples to have them misclassified as legitimate. Acc...

show abstract

Section: Discussionmentioning

confidence: 88%

Section: Machine Learning For Pdf Malware Detectionmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Towards Adversarial Malware Detection

Maiorca¹,

Biggio²,

Giacinto³

2019

ACM Comput. Surv.

View full text Add to dashboard Cite

show abstract

“…As an alternative, machine learning is a popular approach for detecting spam, malware and network intrusion, and it can also be applied to classify PDF files [3,4] . The existing machine learning algorithms can use either static or dynamic features to train PDF classification models [5,6] . The difference is that static feature vectors can be directly obtained by processing a document, while dynamic feature vectors are obtained by monitoring the behavior of samples running in a built virtual environment.…”

Section: Introductionmentioning

confidence: 99%

Detection of Malicious PDF Files Using a Two‐Stage Machine Learning Algorithm

He¹,

Zhu²,

He³

et al. 2020

Chin. j. electron.

View full text Add to dashboard Cite

Cyber Threat Intelligence: Challenges and Opportunities

Conti

Dargahi

Dehghantanha

2018

Advances in Information Security

100

View full text Add to dashboard Cite

The ever increasing number of cyber attacks requires the cyber security and forensic specialists to detect, analyze and defend against the cyberthreats in almost real-time. In practice, timely dealing with such a large number of attacks is not possible without deeply perusing the attack features and taking corresponding intelligent defensive actions -this in essence defines cyberthreat intelligence notion. However, such an intelligence would not be possible without the aid of artificial intelligence, machine learning and advanced data mining techniques to collect, analyse, and interpret cyber attack evidences. In this introductory chapter we first discuss the notion of cyberthreat intelligence and its main challenges and opportunities, and then briefly introduce the chapters of the book which either address the identified challenges or present opportunistic solutions to provide threat intelligence.

show abstract

PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

Cited by 23 publications

References 16 publications

Towards Adversarial Malware Detection

Towards Adversarial Malware Detection

Detection of Malicious PDF Files Using a Two‐Stage Machine Learning Algorithm

Cyber Threat Intelligence: Challenges and Opportunities

Contact Info

Product

Resources

About