Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour.
DeepHyperion
was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this paper, we propose
DeepHyperion-CS
, a test generator for DL systems which enhances
DeepHyperion
by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within
DeepHyperion-CS
outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of
DeepHyperion
.
DeepHyperion-CS
exposed significantly more misbehaviours for 5 out of 6 feature combinations and was up to 65% more efficient than
DeepHyperion
in finding misbehaviour-inducing inputs and exploring the feature space.
DeepHyperion-CS
was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.