Android application (app) stores contain a huge number of apps, which are manually classified based on the apps’ descriptions into various categories. However, the predefined categories or apps descriptions are usually not very accurate to reflect the real functionalities of apps, thereby leading to misclassify the apps, which may cause serious security issues and unreliability problem in the app store. Therefore, the automatic app classification is an important demand to construct a secure, reliable, integrated, and easy to navigate app store. In this paper, we propose an effective method called AndroClass to automatically classify apps based on their real functionalities by using rich and comprehensive features representing the actual functionalities of the apps. AndroClass performs three steps of feature extraction, feature refinement, and classification. In the feature extraction step, we extract 14 various features for each app by utilizing a unified tool suite. In the feature refinement step, we apply Random Forest algorithm to refine the features. In the classification step, we combine refined features into a single one and AndroClass is equipped with K-Nearest Neighbor, Naive Bayes, Support Vector Machine, and Deep Neural Network to classify apps. On the contrary to the existing methods, all the utilized features in AndroClass are stable and clearly represent the actual functionalities of the app, AndroClass does not pose any issues to the user privacy, and our method can be applied to classify unreleased or newly released apps. The results of extensive experiments with two real-world datasets and a dataset constructed by human experts demonstrate the effectiveness of AndroClass where the classification accuracy of AndroClass with the latter dataset is 83.5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.