In this paper, high-speed size and orientation invariant lips detection of a talking person in an active scene using template matching and genetic algorithms is proposed (refer to Fig. 1). As part of the objectives, we also try to acquire numerical parameters to represent the lips. The information is very important for many applications, where high performance is required, such as audio-visual speech recognition, and personal mobile devices interfaces. The difficulty in lips detection is mainly due to deformations and geometric changes of the lips during speech and the active scene by free camera motion. In order to enhance the performance in speed and accuracy, initially, the performance is improved on a single still image, that is, the base of video processing.Our proposed system is based on template matching using genetic algorithms (GA). Only one template is prepared per experiment. The template is the closed mouth of a subject, because the application is for personal devices. In our previous study, the main problem was trade-off between search accuracy and search speed. To overcome this problem, we use two methods: scaling window and dynamic search domain control (SD-Control).The transition of the objective value on the proposed system and the no SD-Control are plotted in Fig. 2. At points from (a) to (d) in Figs. 2, the transition of GA exploration is exactly the same in both cases. The reason is that the same random numbers are used in both cases. Between points (d) and (e) in Fig. 2, the differences in the methods appear. This is because, the search domain is controlled from point (d) in Fig. 2. The elite individual of the no SD-Control do not change from point (c) to (g), hence it is concluded that the GA exploration gets trapped into a local optimum. The reason why the exploration was trapped into a local optimum is that the small population loses the population diversity and causes premature convergence.On the other hand, in case of the proposed system, the evolution make good progress from generation = 117 (point from (e) to (i)). This indicates that the GA of the proposed system can escape from the local maximum. The reason for this is that a search domain is reduced by the dynamic SD-Control and the GA can explore the reduced search domain with meticulous detail. This means that the GA in the proposed system performs not only global optimization but also local optimization. The result of our demonstration clearly shows that a trade-off between exploration accuracy and speed is overcome by the dynamic SD-Control.We achieved a lips detection accuracy of 91.33% at an average processing time of 33.70 milliseconds per frame. Stephen Karungaru * * Non-member Minoru Fukumi * *
MemberIn this paper, high-speed size and orientation invariant lips detection of a talking person in an active scene using template matching and genetic algorithms is proposed. As part of the objectives, we also try to acquire numerical parameters to represent the lips. The information is very important for many applications, where ...