A lot of malicious applications appears every day, threatening numerous users. Therefore, a surge of studies have been conducted to protect users from newly emerging malware by using machine learning algorithms. Albeit existing machine or deep learning-based Android malware detection approaches achieve high accuracy by using a combination of multiple features, it is not possible to employ them on our mobile devices due to the high cost for using them. In this paper, we propose MAPAS, a malware detection system, that achieves high accuracy and adaptable usages of computing resources. MAPAS analyzes behaviors of malicious applications based on API call graphs of them by using convolution neural networks (CNN). However, MAPAS does not use a classifier model generated by CNN, it only utilizes CNN for discovering common features of API call graphs of malware. For efficiently detecting malware, MAPAS employs a lightweight classifier that calculates a similarity between API call graphs used for malicious activities and API call graphs of applications that are going to be classified. To demonstrate the effectiveness and efficiency of MAPAS, we implement a prototype and thoroughly evaluate it. And, we compare MAPAS with a state-of-the-art Android malware detection approach, MaMaDroid. Our evaluation results demonstrate that MAPAS can classify applications 145.8% faster and uses memory around ten times lower than MaMaDroid. Also, MAPAS achieves higher accuracy (91.27%) than MaMaDroid (84.99%) for detecting unknown malware. In addition, MAPAS can generally detect any type of malware with high accuracy.
To handle relentlessly emerging Android malware, deep learning has been widely adopted in the research community. Prior work proposed deep learning-based approaches that use different features of malware, and reported a high accuracy in malware detection, i.e., classifying malware from benign applications. However, familial analysis of real-world Android malware has not been extensively studied yet. Familial analysis refers to the process of classifying a given malware into a family (or a set of families), which can greatly accelerate malware analysis as the analysis gives their fine-grained behavioral characteristics. In this work, we shed light on deep learning-based familial analysis by studying different features of Android malware and how effectively they can represent their (malicious) behaviors. We focus on string features of Android malware, namely the Abstract Syntax Trees (AST) of all functions extracted from each malware, which faithfully represent all string features of Android malware. We thoroughly study how different string features, such as how security-sensitive APIs are used in malware, affect the performance of a neural network. A convolutional neural network was trained and tested in various configurations on 28,179 real-world malware dataset appeared in the wild from 2018 to 2020, where each malware has one or more labels assigned based on their behaviors. Our evaluation reveals how different features contribute to the performance of familial analysis. Notably, with all features combined, we were able to produce up to an accuracy of 98% and a micro F1-score of 0.82, a result on par with the state-of-the-art.
As a great number of IoT and mobile devices are used in our daily lives, the security of mobile devices is being important than ever. If mobile devices which play a key role in connecting devices are exploited by malware to perform malicious behaviors, this can cause serious damage to other devices as well. Hence, a huge research effort has been put forward to prevent such situation. Among them, many studies attempted to detect malware based on APIs used in malware. In general, they showed the high accuracy in detecting malware, but they could not classify malware into detailed categories because their detection mechanisms do not consider the characteristics of each malware category. In this paper, we propose a malware detection and classification approach, named ACAMA, that can detect malware and categorize them with high accuracy. To show the effectiveness of ACAMA, we implement and evaluate it with previously proposed approaches. Our evaluation results demonstrate that ACAMA detects malware with 26% higher accuracy than a previous work. In addition, we show that ACAMA can successfully classify applications that another previous work, AVClass, cannot classify.
Android malware has evolved in various forms such as adware that continuously exposes advertisements, banking malware designed to access users' online banking accounts, and Short Message Service (SMS) malware that uses a Command & Control (C&C) server to send malicious SMS, intercept SMS, and steal data. By using many malicious strategies, the number of malware is steadily increasing. Increasing Android malware threats numerous users, and thus, it is necessary to detect malware quickly and accurately. Each malware has distinguishable characteristics based on its actions. Therefore, security researchers have tried to categorize malware based on their behaviors by conducting the familial analysis which can help analysists to reduce the time and cost for analyzing malware. However, those studies algorithms typically used imbalanced, well-labeled open-source dataset, and thus, it is very difficult to classify some malware families which only have a few number of malware. To overcome this challenge, previous data augmentation studies augmented data by visualizing malicious codes and used them for malware analysis. However, visualization of malware can result in misclassifications because the behavior information of the malware could be compromised. In this study, we propose an android malware familial analysis system based on a data augmentation method that preserves malware behaviors to create an effective multi-class classifier for malware family analysis. To this end, we analyze malware and use Application Programming Interface (APIs) and permissions that can reflect the behavior of malware as features. By using these features, we augment malware dataset to enable effective malware detection while preserving original malicious behaviors. Our evaluation results demonstrate that, when a model is created by using only the augmented data, a macro-F1 score of 0.65 and accuracy of 0.63%. On the other hand, when the augmented data and original malware are used together, the evaluation results show that a macro-F1 score of 0.91 and an accuracy of 0.99%.
With the advent of the 5G network, edge devices and mobile and multimedia applications are used a lot; malware appeared to target edge devices. In the fourth quarter of 2020, 43 million pieces of malware targeting mobile devices occurred. Therefore, a lot of researchers studied various methods to quickly protect users from malware. In particular, they studied detecting malware for achieving the high accuracy with deep learning-based classification models on mobile devices. However, such deep learning-based classifiers consume a lot of resources, and mobile devices have limited hardware resources such as RAM and battery. Therefore, such approaches are difficult to be used in the mobile devices in practice. In this work, we study how a deep learning classifier classifies malware and proposed a novel approach to generate a light-weight classifier that can efficiently and effectively detect malware based on the insight that malware exhibits distinctive features as they are programmed to perform malicious actions such as information leaks. Therefore, by analyzing and extracting distinctive features used by a deep learning classifier from malicious dataset, we generate a light-weight rule-based classifier with high accuracy to efficiently detect malware on edge devices called LiDAR. On an edge device, LiDAR detects malware with 94% accuracy (F1-score) and 85.67% and 328.24% lower usages for CPU and RAM, respectively, than a CNN classifier, and showed the classification time 454.37% faster than the classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.