Our research group is currently studying and developing listening services using spoken dialogue agents and IoT technologies to assist the "mind" of the elderly at home. However, the user identification function, an essential part of the service, has not yet been realized. It is difficult to determine the identity of the person who interacts with the spoken dialogue agent. Although with the rapid development of the artificial intelligence field, various smart devices and services using deep learning have appeared in the face recognition technology, problems exist, including costs and computational resources to build and apply a recognition model. The purpose of this paper is to develop a facial identification system using the pre-trained model and spoken dialogue agent. Our key ideas include automatic training data generation by spoken dialogue between the user and the agent and the acquisition and comparison of facial features using a pre-trained model. In this way, our face identification system can be easier introduced and expected with only a general-purpose computer and a Web camera, without needing a conventional Internet connection and manual labeling of training data.