Deep Learning (DL) has been successfully applied to a wide range of application domains, including safetycritical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system's behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving), spread across the cells of a map representing the feature space of the system. We introduce a methodology that guides the users of our approach in the tasks of identifying and quantifying the dimensions of the feature space for a given domain. We developed DEEPHYPERION, a search-based tool for DL systems that illuminates, i.e., explores at large, the feature space, by providing developers with an interpretable feature map where automatically generated inputs are placed along with information about the exposed behaviours.
Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour.
DeepHyperion
was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this paper, we propose
DeepHyperion-CS
, a test generator for DL systems which enhances
DeepHyperion
by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within
DeepHyperion-CS
outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of
DeepHyperion
.
DeepHyperion-CS
exposed significantly more misbehaviours for 5 out of 6 feature combinations and was up to 65% more efficient than
DeepHyperion
in finding misbehaviour-inducing inputs and exploring the feature space.
DeepHyperion-CS
was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.