Operating in a dynamic real‐world environment requires a forward thinking and adversarial aware design for classifiers beyond fitting the model to the training data. In such scenarios, it is necessary to make classifiers such that they are: (a) harder to evade, (b) easier to detect changes in the data distribution over time, and (c) be able to retrain and recover from model degradation. While most works in the security of machine learning have concentrated on the evasion resistance problem (a), there is little work in the areas of reacting to attacks (b) and (c). Additionally, while streaming data research concentrates on the ability to react to changes to the data distribution, they often take an adversarial agnostic view of the security problem. This makes them vulnerable to adversarial activity, which is aimed toward evading the concept drift detection mechanism itself. In this paper, we analyze the security of machine learning from a dynamic and adversarial aware perspective. The existing techniques of restrictive one‐class classifier models, complex learning‐based ensemble models, and randomization‐based ensemble models are shown to be myopic as they approach security as a static task. These methodologies are ill suited for a dynamic environment, as they leak excessive information to an adversary who can subsequently launch attacks which are indistinguishable from the benign data. Based on empirical vulnerability analysis against a sophisticated adversary, a novel feature importance hiding approach for classifier design is proposed. The proposed design ensures that future attacks on classifiers can be detected and recovered from. The proposed work provides motivation, by serving as a blueprint, for future work in the area of dynamic‐adversarial mining, which combines lessons learned from streaming data mining, adversarial learning, and cybersecurity.
This article is categorized under:
Technologies > Machine Learning
Technologies > Classification
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.