We propose a voice activity detection of a target speaker (driver) in a car by integrating lip movement and acoustic processing. To prevent the wrong detection caused by nontarget speakers using only acoustic processing, the proposed system extracts the lip movement of the target speaker by measuring the lip aspect ratio. An infrared camera is used to cope with the change of lighting environment. In order to extract the lip from gray scale images, Elastic Bunch Graph Matching is employed. Experimental results showed the proposed system improved the precision rate in the voice activity detection by approximately 40% compared to the method using only acoustic processing in a car.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.