Recent research efforts on adversarial ML have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain underexplored.This paper makes two major contributions. First, we propose a novel formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, robustness to preprocessing, and plausibility. We shed light on the relationship between feature space and problem space, and we introduce the concept of side-effect features as the byproduct of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. We further demonstrate the expressive power of our formalization by using it to describe several attacks from related literature across different domains.Second, building on our formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations. Experiments on a dataset with 170K Android apps from 2017 and 2018 show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Our results demonstrate that "adversarial-malware as a service" is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial app. Yet, out of the 1600+ papers on adversarial ML published in the past six years, roughly 40 focus on malware [15]-and many remain only in the feature space.Our formalization of problem-space attacks paves the way to more principled research in this domain. We responsibly release the code and dataset of our novel attack to other researchers, to encourage future work on defenses in the problem space.
No abstract
An Android application is composed by building blocks called Components. Briefly the four main components of an Android application are the Activities, which handle the user interaction, the Services, which handle background processes, the Broadcast Receivers, which handle the communication between the OS and the applications and the Content Providers, which handle the data management. These entities are declared and coupled into the AndroidManifest.xml file that ratifies the interactions between them, other than the application permissions and all the libraries dependencies.
If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.
Concept drift is a major challenge faced by machine learning-based malware detectors when deployed in practice. While existing works have investigated methods to detect concept drift, it is not yet well understood regarding the main causes behind the drift. In this paper, we design experiments to empirically analyze the impact of feature-space drift (new features introduced by new samples) and compare it with data-space drift (data distribution shift over existing features). Surprisingly, we find that data-space drift is the dominating contributor to the model degradation over time while featurespace drift has little to no impact. This is consistently observed over both Android and PE malware detectors, with different feature types and feature engineering methods, across different settings. We further validate this observation with recent online learning based malware detectors that incrementally update the feature space. Our result indicates the possibility of handling concept drift without frequent feature updating, and we further discuss the open questions for future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.