Haipeng Cai scite author profile

Machine learning–based classification dominates current malware detection approaches for Android. However, due to the evolution of both the Android platform and its user apps, existing such techniques are widely limited by their reliance on new malware samples, which may not be timely available, and constant retraining, which is often very costly. As a result, new and emerging malware slips through, as seen from the continued surging of malware in the wild. Thus, a more practical detector needs not only to be accurate on particular datasets but, more critically, to be able to sustain its capabilities over time without frequent retraining. In this article, we propose and study the sustainability problem for learning-based app classifiers. We define sustainability metrics and compare them among five state-of-the-art malware detectors for Android. We further developed DroidSpan , a novel classification system based on a new behavioral profile for Android apps that captures sensitive access distribution from lightweight profiling. We evaluated the sustainability of DroidSpan versus the five detectors as baselines on longitudinal datasets across the past eight years, which include 13,627 benign apps and 12,755 malware. Through our extensive experiments, we showed that DroidSpan significantly outperformed all the baselines in substainability at reasonable costs, by 6%–32% for same-period detection and 21%–37% for over-time detection. The main takeaway , which also explains the superiority of DroidSpan , is that the use of features consistently differentiating malware from benign apps over time is essential for sustainable learning-based malware detection, and that these features can be learned from studies on app evolution.

show abstract

On the Deterioration of Learning-Based Malware Detectors for Android

Cai

2019

View full text Add to dashboard Cite

On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection

Zhao

Wang

et al. 2021

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.

show abstract

Towards sustainable Android malware detection

Cai

Jenkins

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haipeng Cai

DroidCat: Effective Android Malware Detection and Categorization via App-Level Profiling

Assessing and Improving Malware Detection Sustainability through App Evolution Studies

On the Deterioration of Learning-Based Malware Detectors for Android

On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection

Towards sustainable Android malware detection

Contact Info

Product

Resources

About