The recent developments in deep learning techniques evolved to new heights in various domains and applications. The recognition, translation, and video generation of Sign Language (SL) still face huge challenges from the development perspective. Although numerous advancements have been made in earlier approaches, the model performance still lacks recognition accuracy and visual quality. In this paper, we introduce novel approaches for developing the complete framework for handling SL recognition, translation, and production tasks in real-time cases. To achieve higher recognition accuracy, we use the MediaPipe library and a hybrid Convolutional Neural Network + Bi-directional Long Short Term Memory (CNN + Bi-LSTM) model for pose details extraction and text generation. On the other hand, the production of sign gesture videos for given spoken sentences is implemented using a hybrid Neural Machine Translation (NMT) + MediaPipe + Dynamic Generative Adversarial Network (GAN) model. The proposed model addresses the various complexities present in the existing approaches and achieves above 95% classification accuracy. In addition to that, the model performance is tested in various phases of development, and the evaluation metrics show noticeable improvements in our model. The model has been experimented with using different multilingual benchmark sign corpus and produces greater results in terms of recognition accuracy and visual quality. The proposed model has secured a 38.06 average Bilingual Evaluation Understudy (BLEU) score, remarkable human evaluation scores, 3.46 average Fréchet Inception Distance to videos (FID2vid) score, 0.921 average Structural Similarity Index Measure (SSIM) values, 8.4 average Inception Score, 29.73 average Peak Signal-to-Noise Ratio (PSNR) score, 14.06 average Fréchet Inception Distance (FID) score, and an average 0.715 Temporal Consistency Metric (TCM) Score which is evidence of the proposed work.
The speech and hearing-impaired community use sign language as the primary means of communication. It is quite challenging for the general population to interpret or learn sign language completely. A sign language recognition system must be designed and developed to address this communication barrier. Most current sign language recognition systems rely on wearable sensors, keeping the recognition system unaffordable for most individuals. Moreover, the existing vision-based sign recognition frameworks do not consider all of the spatial and temporal information required for accurate recognition. A novel vison-based hybrid deep neural net methodology is proposed in this study for recognizing Indian and Russian sign gestures. The proposed framework is aimed to establish a single framework for tracking and extracting multisemantic properties, such as non-manual components and manual co-articulations. Furthermore, spatial feature extraction from the sign gestures is deployed using a 3D deep neural net with atrous convolutions. The temporal and sequential feature extraction is carried out by employing attention-based Bi-LSTM. In addition, the distinguished abstract feature extraction is done using the modified autoencoders. The discriminative feature extraction for differentiating the sign gestures from unwanted transition gestures is done by leveraging the hybrid attention module. The experimentation of the proposed model has been carried out on the novel multi-signer Indo-Russian sign language dataset. The proposed sign language recognition framework with hybrid neural net yields better results than other state-of-the-art frameworks.
The Sign Language Recognition system intends to recognize the Sign language used by the hearing and vocally impaired populace. The interpretation of isolated sign language from static and dynamic gestures is a difficult study field in machine vision. Managing quick hand movement, facial expression, illumination variations, signer variation, and background complexity are amongst the most serious challenges in this arena. While deep learning-based models have been used to accomplish the entirety of the field's state-of-the-art outcomes, the previous issues have not been fully addressed. To overcome these issues, we propose a Hybrid Neural Network Architecture for the recognition of Isolated Indian and Russian Sign Language. In the case of static gesture recognition, the proposed framework deals with the 3D Convolution Net with an atrous convolution mechanism for spatial feature extraction. For dynamic gesture recognition, the proposed framework is an integration of semantic spatial multi-cue feature detection, extraction, and Temporal-Sequential feature extraction. The semantic spatial multi-cue feature detection and extraction module help in the generation of feature maps for Full-frame, pose, face and hand. For face and hand detection, GradCam and Camshift algorithm have been used. The temporal and sequential module consists of a modified auto-encoder with a GELU activation function for abstract high-level feature extraction and a hybrid attention layer. The hybrid attention layer is an integration of segmentation and spatial attention mechanism. The proposed work also involves creating a novel multi-signer, single and double-handed Isolated Sign representation dataset for Indian and Russian Sign Language. The experimentation was done on the novel dataset created. The accuracy obtained for Static Isolated Sign Recognition was 99.76%, and the accuracy obtained for Dynamic Isolated Sign Recognition was 99.85%. We have also compared the performance of our proposed work with other baseline models with benchmark datasets, and our proposed work proved to have better performance in terms of Accuracy metrics.
In the paper, we consider recognition of sign languages (SL) with a particular focus on Russian and Indian SLs. The proposed recognition system includes five components: configuration, orientation, localization, movement and non-manual markers. The analysis uses methods of recognition of individual gestures and continuous sign speech for Indian and Russian sign languages (RSL). To recognize individual gestures, the RSL Dataset was developed, which includes more than 35,000 files for over 1000 signs. Each sign was performed with 5 repetitions and at least by 5 deaf native speakers of the Russian Sign Language from Siberia. To isolate epenthesis for continuous RSL, 312 sentences with 5 repetitions were selected and recorded on video. Five types of movements were distinguished, namely, "No gesture", "There is a gesture", "Initial movement", "Transitional movement", "Final movement". The markup of sentences for highlighting epenthesis was carried out on the Supervisely.ly platform. A recurrent network architecture (LSTM) was built, implemented using the TensorFlow Keras machine learning library. The accuracy of correct recognition of epenthesis was 95 %. The work on a similar dataset for the recognition of both individual gestures and continuous Indian sign language (ISL) is continuing. To recognize hand gestures, the mediapipe holistic library module was used. It contains a group of trained neural network algorithms that allow obtaining the coordinates of the key points of the body, palms and face of a person in the image. The accuracy of 85 % was achieved for the verification data. In the future, it is necessary to significantly increase the amount of labeled data. To recognize non-manual components, a number of rules have been developed for certain movements in the face. These rules include positions for the eyes, eyelids, mouth, tongue, and head tilt.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.