With the popularity of Android and its open source, the Android platform has become an attractive target for hackers, and the detection and classification of malware has become a research hotspot. Existing malware classification methods rely on complex manual operation or large-volume high-quality training data. However, malware data collected by security providers contains user privacy information, such as user identity and behavior habit information. The increasing concern for user privacy poses a challenge to the current malware classification scheme. Based on this problem, we propose a new android malware classification scheme based on Federated learning, named FedHGCDroid, which classifies malware on Android clients in a privacy-protected manner. Firstly, we use a convolutional neural network and graph neural network to design a novel multi-dimensional malware classification model HGCDroid, which can effectively extract malicious behavior features to classify the malware accurately. Secondly, we introduce an FL framework to enable distributed Android clients to collaboratively train a comprehensive Android malware classification model in a privacy-preserving way. Finally, to adapt to the non-IID distribution of malware on Android clients, we propose a contribution degree-based adaptive classifier training mechanism FedAdapt to improve the adaptability of the malware classifier based on Federated learning. Comprehensive experimental studies on the Androzoo dataset (under different non-IID data settings) show that the FedHGCDroid achieves more adaptability and higher accuracy than the other state-of-the-art methods.
In traditional centralized Android malware classifiers based on machine learning, the training sample uploaded by users contains sensitive personal information, such as app usage and device security status, which will undermine personal privacy if used directly by the server. Federated-learning-based Android malware classifiers have attracted much attention due to their privacy-preserving and multi-party joint modeling. However, research shows that indirect privacy inferences from curious central servers threaten this framework. We propose a privacy risk evaluation framework, FedDroidMeter, based on normalized mutual information in response to user privacy requirements to measure the privacy risk in FL-based malware classifiers. It captures the essential cause of the disclosure of sensitive information in classifiers, independent of the attack model and capability. We performed numerical assessments using the Androzoo dataset, the baseline FL-based classifiers, the privacy-inferred attack model, and the baseline methodology of privacy evaluation. The experimental results show that FedDroidMeter can measure the privacy risks of the classifiers more effectively. Meanwhile, by comparing different models, FL, and privacy parameter settings, we proved that FedDroidMeter could compare the privacy risk between different use cases equally. Finally, we preliminarily study the law of privacy risk in classifiers. The experimental results emphasize the importance of providing a systematic privacy risk evaluation framework for FL-based malware classifiers and provide experience and a theoretical basis for studying targeted defense methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.