Machine learning models, especially deep neural networks (DNNs), have achieved state of the art in computer vision and speech recognition. However, with wide applications of DNNs, some problems have appeared, such as lack of interpretability and vulnerable to adversarial examples. Whether the judgment of the model is consistent with that of human is a key to the wide application and development of neural networks. In this paper, we propose a novel and interpretable method to enable the model to make the same judgment as humans in the adversarial examples, which is based on the geometric invariance between images of the same category. Template matching is combined with convolution neural network during the training and testing stage. Moreover, we manage to give a theorical proof. The geometric invariance features got from the template matching are fused with the features extracted by the convolutional layers. The experimental results demonstrate the temp_model (network added the template matching) has a higher test accuracy both on benchmark sequences and adversarial examples, and we use a visual method to explain the reason why adding template can make the network perform better. The generality and convergence of the network improve without increasing the model size and training time after adding the template as common sense.INDEX TERMS Deep neural network, geometric invariant, interpretability, template match, adversarial example.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.