The predominant means of communication is speech; however, there are persons whose speaking or hearing abilities are impaired. Communication presents a significant barrier for persons with such disabilities. The use of deep learning methods can help to reduce communication barriers. This paper proposes a deep learning-based model that detects and recognizes the words from a person’s gestures. Deep learning models, namely, LSTM and GRU (feedback-based learning models), are used to recognize signs from isolated Indian Sign Language (ISL) video frames. The four different sequential combinations of LSTM and GRU (as there are two layers of LSTM and two layers of GRU) were used with our own dataset, IISL2020. The proposed model, consisting of a single layer of LSTM followed by GRU, achieves around 97% accuracy over 11 different signs. This method may help persons who are unaware of sign language to communicate with persons whose speech or hearing is impaired.
Sign language is the most common form of communication for the deaf and dumb. To bridge the communication gap with such impaired people, normal people should be able to recognize signs. Therefore, it is necessary to introduce a sign language recognition system to assist such impaired people. This paper proposes the Transformer Encoder as a useful tool for sign language recognition. For the recognition of static Indian signs, the authors have implemented a vision transformer. To recognize static Indian sign language, proposed methodology archives noticeable performance over other state-of-the-art convolution architecture. The suggested methodology divides the sign into a series of positional embedding patches, which are then sent to a transformer block with four self-attention layers and a multilayer perceptron network. Experimental results show satisfactory identification of gestures under various augmentation methods. Moreover, the proposed approach only requires a very small number of training epochs to achieve 99.29 percent accuracy.
Deep learning has significantly aided current advancements in artificial intelligence. Deep learning techniques have significantly outperformed more than typical machine learning approaches, in various fields like Computer Vision, Natural Language Processing (NLP), Robotics Science, and Human-Computer Interaction (HCI). Deep learning models are ineffective in outlining their fundamental mechanism. That's the reason the deep learning model mainly consider as Black-Box. To establish confidence and responsibility, deep learning applications need to explain the model's decision in addition to the prediction of results. The explainable AI (XAI) research has created methods that offer these interpretations for already trained neural networks. It's highly recommended for computer vision tasks relevant to medical science, defense system, and many more. The proposed study is associated with XAI for Sign Language Recognition. The methodology uses an attention-based ensemble learning approach to create a prediction model more accurate. The proposed methodology used ResNet50 with the Self Attention model to design ensemble learning architecture. The proposed ensemble learning approach has achieved remarkable accuracy at 98.20%. In interpreting ensemble learning prediction, the author has proposed SignExplainer to explain the relevancy (in percentage) of predicted results. SignExplainer has illustrated excellent results, compared to other conventional Explainable AI models reported in state of the art.
The issue of security is paramount in any organisation. Therefore, the authors intend to aid in the security of such organisations by bringing a video based human authentication system for access control which is a type of cyber physical system (CPS). CPS is an integration of computation and physical processes; here the computation is provided by face detection and recognition algorithm and physical process is the input human face. This system aims to provide a platform that allows any authorized person to enter or leave the premise automatically by using face detection and recognition technology. The system also provides the administrator with the access to the logs, wherein he/she would be able to access the details of the people entering or leaving the organisation along with the live video streaming so that there is no sneaking of any unauthorized person with any other authorized person. The administrator can also do registration on behalf of a new person who requires access to the premises for a restricted amount of time only as specified by the administrator.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.