In the forthcoming era of IoT, where everything will be connected, mobile devices will play a key role in providing data sharing and user-centric services between devices. In such a service environment, if a mobile application is vulnerable to security threats and exposed to malicious behavior, malware can spread to hundreds of millions of connected devices. In particular, it is important to isolate and respond quickly to malicious mobile code. This requires the prediction of malicious behavior. Currently, security risk assessment schemes based on the permission use the description of the application or user review, but these schemes mostly offer a subjective evaluation, which inevitably reduces accuracy. In this paper, we thus propose a scheme for assessing security risk of Android mobile applications by analyzing their application programming interfaces (APIs) using machine learning. The key idea of the proposed scheme is to extract the APIs from the execution code of the application with reverse engineering analysis, such that each API can be compared with the malicious API database built from the existing malware dataset. Instead of simply judging the applications as malicious or benign, our scheme shows their risk as a score. To do this quantitative evaluation, we use an ensemble of tree boosting machine learning algorithms. To prove the practicality of the proposed scheme, we experiment with a set of benign and malicious real world samples, and compare our results with existing schemes. Experimental results show better performance and accuracy than conventional schemes based on Naive Bayes and simple ensemble algorithms. Our proposed scheme is expected to significantly contribute in responding rapidly to ever-more-intelligent malware of the future. INDEX TERMS Malware detection, machine learning, XGBoost, risk assessment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.