When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
Lip movement of a speaker conveys important visual speech information and can be exploited for Automatic Speech Recognition. While previous research demonstrated that visual modality is a viable tool for identifying speech, the visual information has yet to become utilized in mainstream ASR systems. One obstacle is the difficulty in building a robust visual front end that tracks lips accurately in a real-world condition. In this paper we present our current progress in addressing the issue. We examine the use of color information in detecting the lip region and report our results on the statistical analysis and modeling of lip hue images by examining hundreds of manually extracted lip images obtained from several databases. In addition to hue color, we also explore spatial and edge information derived from intensity and saturation images to improve the robustness of the lip detection. Successful application of this algorithm is demonstrated over imagery collected in visually challenging environments.
The US Naval Research Laboratory (NRL) has recently developed an efficient modeling and simulation (M&S) capability to support naval surface warfare applications against a variety of EOIR sensing threats in the context of a tactical decision aid architecture. Starting with ship/ target signature, background sea clutter, and atmospheric transmission inputs obtained from high fidelity models such as ShipIR/NTCS and MODTRAN, combined with an Army CCDC RTID sensor performance metric, NRL used a novel methodology based on machine learning (ML) neural networks (NNs) to reduce large amounts of target/ environment/ sensor parameter data into an efficient network lookup table to predict target detectability. The model is currently valid for a few types of naval targets, in open ocean backgrounds as well as limited littoral scenarios for the VNIR (0.4-1 µm) and IR (3-5 and 8-12 µm) spectral regions. By using ML and NNs, the computational runtimes are short and efficient. This paper will discuss the methodology and show preliminary results produced in an integrated tactical decision aid software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.