Proceedings of the 15th International Joint Conference on E-Business and Telecommunications 2018
DOI: 10.5220/0006884704120419
|View full text |Cite
|
Sign up to set email alerts
|

Malware Detection in PDF Files using Machine Learning

Abstract: Abstract:We present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack was almost 100% successful. Next, we provided counter-measures to this attack: a more elabor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 5 publications
0
1
0
Order By: Relevance
“…The first step involves acquiring the dataset containing malware detection data. For this purpose, publicly available data was collected, specifically utilizing the dataset for the classification of malware with PE Headers sourced from github.com [11]. This dataset serves as the foundation for classifying PE files into two categories: malware and benign The dataset used in the study is extensive, comprising over 138,000 instances.…”
Section: Research Methods 21 Research Designmentioning
confidence: 99%
“…The first step involves acquiring the dataset containing malware detection data. For this purpose, publicly available data was collected, specifically utilizing the dataset for the classification of malware with PE Headers sourced from github.com [11]. This dataset serves as the foundation for classifying PE files into two categories: malware and benign The dataset used in the study is extensive, comprising over 138,000 instances.…”
Section: Research Methods 21 Research Designmentioning
confidence: 99%
“…Using a gradient-descent (GD) approach, the naive SVM used by the authors in [37] was easily deceived by us. The authors also devised defenses against this assault by setting a threshold over each considered feature.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Since Portable Document Format files can include a variety of harmful material, including embedded scripts, exploits, and malicious URLs, it can be difficult to detect malware in them. A reading flaw might be used by malware software to try to infect a machine [2]. Adobe Acrobat Reader discovered a huge number of vulnerabilities in 2017.…”
Section: Introductionmentioning
confidence: 99%