DaDiDroid: An Obfuscation Resilient Tool for Detecting Android Malware via Weighted Directed Call Graph Modelling

Ikram, Muhammad; Beaume, Pierrick; Kâafar, Mohamed Ali

doi:10.5220/0007834602110219

Cited by 22 publications

(26 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this way an automatic matching of the calls is avoided. A method call is converted to a call which is then invoked to the original call [25]. (vi) dynamic loading technique, malware externally loads data and/or code dynamically, from an external server at startup time.…”

Section: Security and Communication Networkmentioning

confidence: 99%

Combat Mobile Evasive Malware via Skip-Gram-Based Malware Detection

Egitmen

Bulut²,

Aygun³

et al. 2020

Security and Communication Networks

View full text Add to dashboard Cite

Android malware detection is an important research topic in the security area. There are a variety of existing malware detection models based on static and dynamic malware analysis. However, most of these models are not very successful when it comes to evasive malware detection. In this study, we aimed to create a malware detection model based on a natural language model called skip-gram to detect evasive malware with the highest accuracy rate possible. In order to train and test our proposed model, we used an up-to-date malware dataset called Argus Android Malware Dataset (AMD) since the AMD contains various evasive malware families and detailed information about them. Meanwhile, for the benign samples, we used Comodo Android Benign Dataset. Our proposed model starts with extracting skip-gram-based features from instruction sequences of Android applications. Then it applies several machine learning algorithms to classify samples as benign or malware. We tested our proposed model with two different scenarios. In the first scenario, the random forest-based classifier performed with 95.64% detection accuracy on the entire dataset and 95% detection accuracy against evasive only samples. In the second scenario, we created a test dataset that contained zero-day malware samples only. For the training set, we did not use any sample that belongs to the malware families in the test set. The random forest-based model performed with 37.36% accuracy rate against zero-day malware. In addition, we compared our proposed model’s malware detection performance against several commercial antimalware applications using VirusTotal API. Our model outperformed 7 out of 10 antimalware applications and tied with one of them on the same test scenario.

show abstract

Section: Security and Communication Networkmentioning

confidence: 99%

Combat Mobile Evasive Malware via Skip-Gram-Based Malware Detection

Egitmen

Bulut²,

Aygun³

et al. 2020

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…The API calls can be extracted at various granularity levels such as method, class, package, and family. Since there are millions of unique methods in Android, some approaches [19,21,30] that are based on the use or the frequency of API calls have proposed to abstract API calls at class, package, and/or family levels. This reduced the number of features significantly and yet produced comparable or even better results [19,21,30].…”

Section: Introductionmentioning

confidence: 99%

“…To extract these features, in general two types of techniques are used -static analysis [5,9,19,21,30,46] and dynamic analysis [15,41]. For instance, Drebin [5] extracts permissions and API calls by scanning manifest files and disassembled code.…”

Section: Introductionmentioning

confidence: 99%

“…For instance, Drebin [5] extracts permissions and API calls by scanning manifest files and disassembled code. DadiDroid [21] and MamaDroid [30] extract API calls from call graphs. The majority of the approaches has relied on static analysis for feature extraction.…”

Section: Introductionmentioning

confidence: 99%

“…Once these features are extracted using program analyses, these approaches typically use machine learning classifiers to train on the features and build malware detection model. For instance, Support Vector Machines (SVM), K-Nearest Neighbours, and Random Forest were used in [21,30]; AdaBoost, Naive Bayes, Decision Tree, and SVM were used in [20].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Experimental comparison of features and classifiers for Android malware detection

Shar

Demissie

Ceccato

et al. 2020

Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems

View full text Add to dashboard Cite

Android platform has dominated the smart phone market for years now and, consequently, gained a lot of attention from attackers. Malicious apps (malware) pose a serious threat to the security and privacy of Android smart phone users. Available approaches to detect mobile malware based on machine learning rely on features extracted with static analysis or dynamic analysis techniques. Different types of machine learning classifiers (such as support vector machine and random forest) deep learning classifiers (based on deep neural networks) are then trained on extracted features, to produce models that can be used to detect mobile malware. The usually-analyzed features include permissions requested/used, frequency of API calls, use of API calls, and sequence of API calls. The API calls are analyzed at various granularity levels such as method, class, package, and family. In the view of the proposals of different types of classifiers and the use of different types of features and different underlying analyses used for feature extraction, there is a need for a comprehensive evaluation on the effectiveness of the current state-of-the-art studies in malware detection on a common benchmark. In this work, we provide a baseline comparison of several conventional machine learning classifiers and deep learning classifiers, without fine tuning. We also provide the evaluation of different types of features that characterize the use of API calls at class level and the sequence of API calls at method level. Features have been extracted from a common benchmark of 4572 benign samples and 2399 malware samples, using both static analysis and dynamic analysis. Among other interesting findings, we observed that classifiers trained on the use of API calls generally perform better than those trained on the sequence of API calls. Classifiers trained on static analysis-based features perform better than those trained on dynamic analysis-based features. Deep learning classifiers, despite their sophistication, are not necessarily better than conventional classifiers, especially when they are not optimized. However, deep

show abstract