Sign language recognition (SLR) is one of the crucial applications of the hand gesture recognition and computer vision research domain. There are many researchers who have been working to develop a hand gesture-based SLR application for English, Turkey, Arabic, and other sign languages. However, few studies have been conducted on Korean sign language classification because few KSL datasets are publicly available. In addition, the existing Korean sign language recognition work still faces challenges in being conducted efficiently because light illumination and background complexity are the major problems in this field. In the last decade, researchers successfully applied a vision-based transformer for recognizing sign language by extracting long-range dependency within the image. Moreover, there is a significant gap between the CNN and transformer in terms of the performance and efficiency of the model. In addition, we have not found a combination of CNN and transformer-based Korean sign language recognition models yet. To overcome the challenges, we proposed a convolution and transformer-based multi-branch network aiming to take advantage of the long-range dependencies computation of the transformer and local feature calculation of the CNN for sign language recognition. We extracted initial features with the grained model and then parallelly extracted features from the transformer and CNN. After concatenating the local and long-range dependencies features, a new classification module was applied for the classification. We evaluated the proposed model with a KSL benchmark dataset and our lab dataset, where our model achieved 89.00% accuracy for 77 label KSL dataset and 98.30% accuracy for the lab dataset. The higher performance proves that the proposed model can achieve a generalized property with considerably less computational cost.
Human activity recognition is one of the most difficult tasks in computer vision. Due to the lack of time information, detecting human activities from still photos is more difficult than sensor-based or videobased techniques. Recently, various deep learning based solutions are being proposed one after another, and their performance is constantly improving. In this paper, we proposed a convolutional neural architecture by ensembling transfer learning based multi-channel attention networks. Here, four CNN branches were used to make feature fusion based ensembling and in each branch, an attention module was used to extract the contextual information from the feature map produced by existing pre-trained models. Finally, the extracted feature maps from four branches were concatenated and fed to fully connected network to produce the final recognition output. We considered 3 different datasets, Stanford 40 actions, BU-101 and Willow human actions datasets to evaluate our system. Experimental analysis showed that the proposed ensembled convolutional architecture outperformed previous works by a noteworthy margin.
The prevention of falls has become crucial in the modern healthcare domain and in society for improving ageing and supporting the daily activities of older people. Falling is mainly related to age and health problems such as muscle, cardiovascular, and locomotive syndrome weakness, etc. Among elderly people, the number of falls is increasing every year, and they can become life-threatening if detected too late. Most of the time, ageing people consume prescription medication after a fall and, in the Japanese community, the prevention of suicide attempts due to taking an overdose is urgent. Many researchers have been working to develop fall detection systems to observe and notify about falls in real-time using handcrafted features and machine learning approaches. Existing methods may face difficulties in achieving a satisfactory performance, such as limited robustness and generality, high computational complexity, light illuminations, data orientation, and camera view issues. We proposed a graph-based spatial-temporal convolutional and attention neural network (GSTCAN) with an attention model to overcome the current challenges and develop an advanced medical technology system. The spatial-temporal convolutional system has recently proven the power of its efficiency and effectiveness in various fields such as human activity recognition and text recognition tasks. In the procedure, we first calculated the motion along the consecutive frame, then constructed a graph and applied a graph-based spatial and temporal convolutional neural network to extract spatial and temporal contextual relationships among the joints. Then, an attention module selected channel-wise effective features. In the same procedure, we repeat it six times as a GSTCAN and then fed the spatial-temporal features to the network. Finally, we applied a softmax function as a classifier and achieved high accuracies of 99.93%, 99.74%, and 99.12% for ImViA, UR-Fall, and FDD datasets, respectively. The high-performance accuracy with three datasets proved the proposed system’s superiority, efficiency, and generality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.