Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings.:2 D. Maiorca et al.formats to conceal malicious code, making their detection significantly harder. Second, infection vectors can be effectively used in social engineering campaigns, as victims are more prone to receive and open documents or multimedia content. Finally, although vulnerabilities of third-party applications are often publicly disclosed, they are not promptly patched. The absence of proper security updates makes thus the lifespan of attacks perpetrated with infection vectors much longer.Machine learning-based technologies have been increasingly used both in academic and industrial environments (see e.g., [48]) to detect malware embedded in infection vectors like malicious PDF files. Research work has demonstrated that learning-based systems could be effective at detecting obfuscated attacks that are typically able to evade simple heuristics [23,65,82,95], but the problem is still far from being solved. Despite the significant increment of detected attacks, researchers started questioning the reliability of learning algorithms against adversarial attacks carefully-crafted against them [8-10, 17, 18]. Such attacks became widely popular when researchers showed that it was possible to evade deep learning algorithms for computer vision with adversarial examples, i.e., minimally-perturbed images that mislead classification [40,88]. The same attack principles have also been employed to craft adversarial malware samples, as first shown in [9], and subsequently explored in [29,42,52,96,99]. Such attacks can typically perform few, fine-grained changes on correctly detected, malicious samples to have them misclassified as legitimate. Acc...