This paper presents a novel Arabic Sign Language (ArSL) recognition system, using selected 2D hands and body key points from successive video frames. The system recognizes the recorded video signs, for both signer dependent and signer independent modes, using the concatenation of a 3D CNN skeleton network and a 2D point convolution network. To accomplish this, we built a new ArSL video-based sign database. We will present the detailed methodology of recording the new dataset, which comprises 80 static and dynamic signs that were repeated five times by 40 signers. The signs include Arabic alphabet, numbers, and some daily use signs. To facilitate building an online sign recognition system, we introduce the inverse efficiency score to find a sufficient optimal number of successive frames for the recognition decision, in order to cope with a near real-time automatic ArSL system, where tradeoff between accuracy and speed is crucial to avoid delayed sign classification. For the dependent mode, best results were obtained for dynamic signs with an accuracy of 98.39%, and 88.89% for the static signs, and for the independent mode, we obtained for the dynamic signs an accuracy of 96.69%, and 86.34% for the static signs. When both the static and dynamic signs were mixed and the system trained with all the signs, accuracies of 89.62% and 88.09% were obtained in the signer dependent and signer independent modes respectively.
Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.