Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure userfriendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer's hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni-and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits-similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. F...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.