Personal authentication based on the periocular region can be performed even when the person's mouth and nose are hidden by a face mask. However, using visible light images for periocular recognition is problematic because recognition accuracy is affected by changes in the lighting conditions. We have developed a method for periocular recognition that overcomes this problem by using thermal images, which are less affected by changes in lighting conditions, in addition to visible light images. In this paper, we propose a method using both thermal and visible light images for periocular recognition based on features obtained by CNN. In addition, our method uses deep metric learning to deal with persons who are not included in the training data. To evaluate the accuracy of the proposed method under unstable conditions, we conducted recognition experiments using images of 83 subjects obtained from the USTC-NVIE database, which contains visible light and thermal images taken simultaneously under various lighting conditions and with various facial expressions. The experimental results show that using both visible light and thermal images achieves higher recognition accuracy than using only visible light images.